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Many accident investigations make the same mistake in defining causes. 
They identify the widget that broke or malfunctioned, then locate the 
person most closely connected with the technical failure: the engineer 
who miscalculated an analysis, the operator who missed signals or 
pulled the wrong switches, the supervisor who failed to listen, or the 
manager who made bad decisions. When causal chains are limited to 
technical flaws and individual failures, the ensuing responses aimed at 
preventing a similar event in the future are equally limited: they aim to 
fix the technical problem and replace or retrain the individual respon- 
sible. Such corrections lead to a misguided and potentially disastrous 
belief that the underlying problem has been solved. (Columbia Accident 
Investigation Board, 2003, p. 97) 


I'll never forget the day my phone rang early on a Sunday morning in 2006. 
The voice on the other end of the line informed me there had been an air- 
line crash in Lexington, Kentucky. The airplane was still on fire and mul- 
tiple fatalities were expected. With that, I started packing my bags to head 
to Kentucky. 

Before even leaving my house, the images on TV pretty much told me what 
had happened. The wreckage was positioned a few thousand feet directly off 
the end of a runway that would have been too short for an airplane of that 
size to successfully takeoff. A broken fence at the end of the runway and tire 
marks through the grass to the initial impact point provided further clues. 
So, before even leaving my house, I had surmised the pilots made an error of 
attempting to depart from the wrong runway. 

Error identified, case closed. Right? 

Well, actually not. A good friend, Captain Daniel Maurino, stated “the 
discovery of human error should be considered as the starting point of the 
investigation, and not the ending point that has punctuated so many previ- 
ous investigations” (Maurino, 1997). His words of wisdom are framed in my 
office to serve as a constant reminder of the necessity to look behind the 
obvious human error. It is one thing to say someone committed an error, but 
it is quite another to try to identify the underlying factors that influenced 
that error. And why do we care? Finding who or what is at “fault” should 
not be simply an exercise in attributing error, but rather, should be for the 
purpose of identifying the factors that influenced the error so those condi- 
tions can be corrected to prevent future errors. If we simply say “human 
error,” “pilot error,” or “operator error,” and stop with that, we miss valuable 
learning opportunities. The Institute of Medicine noted in a seminal report 
on medical error that “blaming the individual does not change these factors 
and the same error is likely to recur” (Institute of Medicine, 2000, p. 49). 
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In the case of the Lexington, Kentucky crash, the error was identified 
within hours, if not minutes, after it occurred. But, identifying the human 
error doesn’t mean the investigation is completed; instead, it should be, as 
Daniel Maurino stated, the starting point of the investigation. 

Once the human error was identified, the prevailing question should (and 
did) become “Why was the error committed?” Were the pilots fatigued? Did 
the fact that the airport was undergoing construction of runways and taxi- 
ways somehow confuse the pilots during taxi-out? How did the disparity 
between taxiway signs and what was depicted on the pilots’ airport dia- 
gram charts affect their performance? Did organizational factors such as 
poor training or lack of company standardization somehow contribute to 
the error? What role did understaffing in the control tower play? Did the 
crew’s casual attitude enable their error? Why did two other flights success- 
fully navigate the airport construction and taxi to the correct runway in the 
moments before the crash, but this crew did not? Only after questions such 
as these are answered can the human error be understood and the underly- 
ing conditions corrected. 

Since that accident in 2006, I’ve been involved in the deliberation of some 
150 or so transportation accidents. From that experience, I have developed 
the belief that most, if not all, accidents or incidents have roots in human 
error. In some cases, it is a readily identifiable error of a frontline opera- 
tor, such as a pilot, ship’s master, medical technician, air traffic controller, or 
control room operator. In other cases, the error(s) may not be obvious at all. 
It may be deeply embedded within the system, perhaps far, far away from 
the scene of the accident, such as decisions and actions/inactions made by 
organizations or regulators. As explained in this book, there are proximate 
errors—those that are closest to the accident in terms of timing or location, 
and there are underlying conditions that are factors in the accident causa- 
tion, but perhaps not readily apparent. Reason (1990, 1997) refers to these as 
active failures, and latent conditions, respectively. 

Contemporary thinking views error as a “symptom of deeper trouble” 
(Dekker, 2002, p. 61) within the system. Maurino said human error should 
be “considered like fever: an indication of illness rather than its cause. It is a 
marker announcing problems in the architecture of the system” (Maurino, 
1997). 

In the early 1990s, then National Transportation Safety Board (NTSB) 
board member John Lauber was one of the first to focus on how organiza- 
tional factors can influence transportation safety (Meshkati, 1997). Lauber 
argued that the cause of a commuter airliner in-flight breakup due to faulty 
maintenance should be “the failure of Continental Express management to 
establish a corporate culture which encouraged and enforced adherence 
to approved maintenance and quality assurance procedures” (NTSB, 1992, 
p. 54). Of the five NTSB board members, Lauber was alone in his belief. The 
conventional thinking at the time seemed to be to identify the proximate 
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error that sparked the accident and call that the “cause” of the mishap. But, 
as discussed throughout this book, human error does not occur ina vacuum. 
It must therefore be examined in the context in which the error occurred. 
In other words, if an error occurs in the workplace, the workplace must be 
examined to look for conditions that could provoke error. What were the 
physical conditions at the workplace? Was lighting adequate to perform the 
task? Were the procedures and training adequate? Did the organizational 
norms and expectations prioritize safety over competing goals? Was the 
operational layout of the workplace conducive for error? 

Organizational factors have been implicated in accidents and incidents in 
many socio-technical industries. For example, the U.S. Chemical Safety and 
Hazard Investigation Board (Chemical Safety Board [CSB]) determined that 
a 2005 oil refinery explosion that claimed 15 lives and injured 180 people had 
numerous organizational-related factors, such as the company’s cost cutting 
and overreliance on misleading safety metrics (CSB, 2007). The International 
Atomic Energy Agency (IAEA) stated that the Chernobyl nuclear power 
plant meltdown “flowed from a deficient safety culture, not only at the 
Chernobyl plant, but throughout the Soviet design, operating, and regula- 
tory organizations for nuclear power” (IAEA, 1992, pp. 23-24). The National 
Transportation Safety Board (2010) found organizational issues to be a causal 
factor in the 2009 multi-fatality subway accident in Washington, DC. 

In 2015, I was involved in the final deliberation of an accident involving 
SpaceShipTwo, a commercial space vehicle that suffered an in-flight breakup 
during a test flight. From the onboard video recorder, it was evident that the 
copilot prematurely moved a lever which led to an uncommanded move- 
ment of the vehicle's tail feather—a device similar to a conventional aircraft's 
horizontal and vertical stabilizer. The tail feather is actuated by a cockpit 
lever to pivot it upward 60? relative to the longitudinal axis of the aircraft; 
its purpose is to stabilize the aircraft during the reentry phase of flight. 
However, if the feather is deployed at the wrong time, as in this case, the 
resulting aerodynamic loads on the aircraft will lead to catastrophic in-flight 
breakup. The obvious "cause" of the accident was that the copilot committed 
an error of unlocking the feather at the wrong time which led to the uncom- 
manded actuation of the feather. But, this finding alone would serve no use- 
ful purpose for preventing similar errors in the future. After all, the copilot 
was killed, so surely he would not commit this error again. 

By digging deeper, the investigation found that influencing the copilot's 
error was the high workload he was experiencing during this phase of 
flight, along with time pressure to complete critical tasks from memory— 
all while experiencing vibration and g-loads that he had not experienced 
recently. On the broader perspective, the spaceship designer/manufac- 
turer, Scaled Composites, did not consider that this single error could lead 
to an unintended feather activation. Although the copilot had practiced 
for this flight several times in the simulator, this premature movement 
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of the feather unlock lever occurred on the fourth powered flight of the 
SpaceShipTwo, indicating to me that the likelihood of this error was high. 
^By not considering human error as a potential cause of uncommanded 
feather extension of the SpaceShipTwo, Scaled Composites missed oppor- 
tunities to identify the design and/or operational requirements that could 
have mitigated the consequences of human error during a high workload 
phase of flight" (NTSB, 2015, p. 67). Because of the underlying design 
implications, the National Transportation Safety Board issued a safety rec- 
ommendation to the Federal Aviation Administration (FA A) to ensure that 
commercial space flight entities identify and address "single flight crew 
tasks that, if performed incorrectly or at the wrong time, could result in 
a catastrophic hazard" (NTSB, 2015, p. 70). In addition, the manufacturer 
added a safety interlock to ensure that this lever could not be activated 
during this critical flight regime. 

Not only can organizations create error provoking conditions, but regula- 
tors can do so as well. Examples include failing to provide adequate oversight 
and enforcement, or not developing adequate procedures. In 2009, a Pacific 
Gas & Electric Company (PG&E) 30-inch diameter natural gas transmission 
pipeline ruptured and exploded. The conflagration claimed eight lives in 
San Bruno, California, destroyed 38 homes and damaged 70. The investiga- 
tion determined that oversight and enforcement by both the state and federal 
regulators was ineffective, which "permitted PG&E's organizational failures 
to continue over many years" (NTSB, 2011, p. 126). 

So, as you can see, human behavior, including errors, can be influenced by 
many factors. Therefore, investigation of human error should not be a ran- 
dom hit or miss process. It should be conducted in an organized, methodical 
process, with a clear purpose in mind. Dr. Barry Strauch has been on the 
frontlines of investigating human error for nearly 35 years and he has pro- 
vided human factors expertise to well over a hundred aviation and maritime 
accidents. Between the covers of this book, he lays out in clear terms the fac- 
tors that enable human error, including individual factors such as fatigue, 
stress, and medical factors. He also examines in detail organizational and 
regulatory precursors to error. Each chapter provides a bulleted checklist to 
facilitate identifying relevant factors. This second edition provides an update 
to what I found to be an excellent reference—one that I often referred to in 
my decade of serving on an accident investigation board, as indicated by 
scores of dog-eared pages filled with underlining and highlighting. I encour- 
age anyone involved with investigating any type of error—whether that error 
occurred in the hospital, on the hangar floor, in a nuclear control room, or 
on the flight deck of an airliner—to use this text as resource to investigating 
human error. Using this book as a guide—I assure you—will not be an error. 


Robert L. Sumwalt 
Washington, DC 
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Foreword to First Edition 





Like the rest of the modern world, I owe an enormous debt to the skills 
of professional accident investigators. As a traveler and a consumer, | am 
extremely grateful for what they have done to make complex technologies 
significantly safer; but as an academic, I have also been especially dependent 
on their published findings. Although, mercifully, I have had very little first- 
hand experience of the real thing, this has not prevented me from writing, 
lecturing, and theorizing about the human contribution to the breakdown of 
complex systems for the past 30 years or so. There are perhaps two reasons 
why I have so far been able to pull this off. The first is that the ivory tower 
provided the time and resources to look for recurrent patterns in a large 
number of adverse events over a wide range of hazardous technologies, a 
luxury that few “real-world” people could enjoy. The second has been the 
high quality of most major accident reports. If such accounts had later been 
shown to lack accuracy, insight, analytical depth, or practical value, then 
my reliance upon them would have been foolish or worse. But while many 
have challenged the theories, very few have questioned the credibility of the 
sources. 

So, you might ask, if accident investigators are doing so well, why do they 
need this book? The most obvious answer is that human, organizational, and 
systemic factors, rather than technical or operational issues, now dominate 
the risks to most hazardous industries—yet the large majority of accident 
investigators are technical and operational specialists. Erik Hollnagel (1993) 
carried out a survey of the human factors literature over three decades to 
track the increasing prominence of the “human error” problem. In the 1960s, 
erroneous actions of one kind or another were estimated as contributing 
around 20% of the causal contributions to major accidents. By the 1990s, how- 
ever, this figure had increased fourfold. One obvious explanation is that the 
reliability of mechanical and electronic components has increased markedly 
over this period, while complex systems are still being managed, controlled, 
and maintained by Mark I human beings. 

In addition, this period has seen some subtle changes in the way we per- 
ceive the “human error” problem and its contribution to accidents. For the 
most part, “human error” is no longer viewed as a single portmanteau cate- 
gory, a default bin into which otherwise unexplained factors can be dumped. 
We now recognize that erroneous actions come in a variety of forms and 
have different origins, both in regard to the underlying psychological mech- 
anisms and their external shaping factors. It is also appreciated that front- 
line operators do not possess a monopoly on error. Slips, lapses, mistakes, 
and violations can occur at all levels of the system. We are now able to view 
errors as consequences rather than sole causes, and see frontline operators 


xxi 


xxii Foreword to First Edition 


more as the inheritors rather than the instigators of accidents in complex 
systems. 

System complexity derives in large part from the existence of diverse and 
redundant layers of defences, barriers, and safeguards that are designed to 
prevent operational hazards from coming into damaging contact with peo- 
ple, assets, and the environment. The nuclear industry calls them “defenses- 
in-depth” Such characteristics make it highly unlikely that accidents in 
complex systems arise from any single factor, be it human, technical, or envi- 
ronmental. The apparently diabolical conjunction of several different factors 
is usually needed to breach all of these defenses-in-depth at the same time. 
This makes such events less frequent, but the causes more complex. Some 
of the latent contributions have often lain dormant in the system for many 
years prior to the accident. Given the increasing recognition that contribut- 
ing factors can have both a wide scope and a long history, it is almost inevi- 
table that investigators will net larger numbers of human and organizational 
shortcomings. 

Another associated change—at least within the human factors and inves- 
tigative communities—has been a shift away from the “person model” of 
human error, in which the search for causes and their countermeasures is 
focused almost exclusively upon the psychology of individuals. Instead, there 
has been an increasing willingness to take a systems view of accident causa- 
tion in which the important question is not “Who blundered?” but “How and 
why did the defenses fail?” Unfortunately, the person model is still deeply 
embedded in the human psyche, and is especially pernicious in its moral (or 
legal) form. This is the widespread belief that responsible and highly trained 
professionals (pilots, surgeons, ship's officers, control room operators, and 
the like) should not make errors. However, when such erroneous actions do 
occur, it is assumed that they are sufficient to cause bad accidents. The reality, 
of course, is quite different. Highly trained, responsible professionals make 
frequent errors, but most are inconsequential, or else they are detected and 
recovered (see, e.g., Amalberti and Wioland, 1997). Moreover, these errors are 
only occasionally necessary to add the final touches to an accident-in-waiting, 
a potential scenario that may have been lurking within a complex system for 
a long time. 

The achievements of accident investigators are all the more remarkable 
when one considers the snares, traps, and pitfalls that lie in their path. Aside 
from the emotional shock of arriving at often inaccessible and hostile loca- 
tions to confront the horrors of an accident site, investigators are required to 
track backward—sometimes for many years—in order to create a coherent, 
accurate, and evidence-based account of how and why the disaster occurred, 
and to make recommendations to prevent the recurrence of other tragedies. 
The first and most obvious problem is that the principal witnesses to the acci- 
dent are often dead or incapacitated. But this, as most investigators would 
acknowledge, goes with the territory. Other difficulties are less apparent and 
have to do with unconscious cognitive biases that influence the way people 
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arrive at judgments about blame and responsibility and cause and effect. 
While human factors specialists have focused mainly upon the error tenden- 
cies of the operators of complex systems, there has also been considerable 
interest in how people trying to make sense of past events can go astray. Let 
me briefly review some of these investigative error types. They fall into two 
related groups: those that can bias attributions of blame and responsibility 
and those that can distort perceptions of cause and effect. 

Here are some of the reasons why the urge to blame individuals is so 
strong. When looking for an explanation of an occurrence, we are biased to 
find it among human actions that are close in time and place to the event, 
particularly if one or more of them are considered discrepant. This leads to 
what has been termed the counterfactual fallacy (Miller and Turnbull, 1990) 
where we confuse what might have been with what ought to have been, par- 
ticularly in the case of bad outcomes. The fallacy goes as follows: Had things 
been otherwise (i.e., had this act not happened), there would have been no 
adverse result; therefore, the person who committed the act is responsible 
for the outcome. 

Another factor that leads to blaming is the fundamental attribution error 
(Fiske and Taylor, 1984). This is the universal human tendency to resort to 
dispositional rather than to situational influences when explaining people's 
actions, particularly if they are regarded as unwise or unsafe. We say that the 
person was stupid or careless; but, if the individual in question were asked, 
he or she is most likely to point to the local constraints. The truth usually lies 
somewhere in between. 

The just world hypothesis (Lerner, 1970)—the view that bad things hap- 
pen to bad people, and conversely— comes into play when there is an espe- 
cially unhappy outcome. Such a belief is common among children, but it can 
often last into adulthood. A close variant is the representativeness heuristic 
(Iversky and Kahneman, 1974) or the tendency to presume a symmetrical 
relationship between cause and effect—thus bad consequences must be 
caused by horrendous blunders, while particularly good events are seen as 
miracles. 

Yet another reason why people are so quick to assign blame is the illu- 
sion of freewill (Lefcourt, 1973). People, particularly in western cultures, place 
great value in the belief that they are the controllers of their own fate. They 
can even become mentally ill when deprived of this sense of personal free- 
dom. Feeling themselves to be capable of choice naturally leads them to 
assume that other people are the same. They are also seen as free agents, 
able to choose between right and wrong, and between correct and erroneous 
actions. But our actions are often more constrained by circumstances than 
we are willing to admit or understand. 

All accident investigators are faced with the task of digitizing an essentially 
analog occurrence; in other words, they have to chop up continuous and 
interacting sequences of prior events into discrete words, paragraphs, con- 
clusions, and recommendations. If one regards each sequence as a piece of 


XXIV Foreword to First Edition 


string (though it is a poor analogy), then it is the investigator’s task to tie 
knots at those points marking what appear to be significant stages in the 
development of the accident. Such partitioning is essential for simplifying 
the causal complexity, but it also distorts the nature of the reality (Woods, 
1993). If this parsing of events correctly identifies proper areas for remedia- 
tion, then the problem is a small one; but it is important for those who rely 
on accident reports to recognize that they are—even the best of them—only 
a highly selected version of the actuality, and not the whole truth. It is also 
a very subjective exercise. Over the years, I have given students the task of 
translating these accident narratives into event trees. Starting with the acci- 
dent itself, they were required to track back in time, asking themselves at each 
stage what factors were necessary to bring about the subsequent events—or, 
to put it another way, which elements, if removed, would have thwarted the 
accident sequence. Even simple narratives produced a wide variety of event 
trees, with different nodes and different factors represented at each node. 
While some versions were simply inaccurate, most were perfectly acceptable 
accounts. The moral was clear: the causal features of an accident are to the 
analyst what a Rorschach test (inkblot test) is to the psychologist—some- 
thing that is open to many interpretations. The test of a good accident report 
is not so much its fidelity to the often-irrecoverable reality, but the extent to 
which it directs those who regulate, manage, and operate hazardous tech- 
nologies toward appropriate and workable countermeasures. 

A further problem in determining cause and effect arises from the human 
tendency to confuse the present reality with that facing those who were 
directly involved in the accident sequence. A well-studied manifestation 
of this is hindsight bias or the knew-it-all-along effect (Fischhoff, 1975; Woods 
et al., 1994). Retrospective observers, who know the outcome, tend to exag- 
gerate what the people on the spot should have appreciated. Those looking 
back on an event see all the causal sequences homing in on that point in time 
at which the accident occurred; but those involved in the prior events, armed 
only with limited foresight, see no such convergence. With hindsight, we 
can easily spot the indications and warning signs that should have alerted 
those involved to the imminent danger. But most “warning” signs are only 
effective if you know in advance what kind of accident you are going to have. 

Sydney Decker (2001) has added two further phenomena to this catalog of 
investigative pitfalls: he termed them micro-matching and cherry-picking. Both 
arise, he argues, from the investigator’s tendency to treat actions in isolation. 
He calls this “the disembodiment of human factors data.” Micro-matching is a 
form of hindsight bias in which investigators evaluate discrete performance 
fragments against standards that seem applicable from their after-the-fact 
perspective. It often involves comparing human actions against written 
guidance or data that were accessible at the time and should have indicated 
the true situation. As Decker puts it: “Knowledge of the ‘critical’ data comes 
only with the omniscience of hindsight, but if data can be shown to have 
been physically available, it is assumed that it should have been picked up by 
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the practitioners in the situation.” The problem, he asserts, is that such judg- 
ments do not explain why this did not happen at the time. Cherry-picking, 
another variant of hindsight bias, involves identifying patterns of isolated 
behavioral fragments on the basis of post-event knowledge. This grouping 
is not a feature of the reality, but an artifact introduced by the investigator. 
Such tendencies, he maintains, derive from the investigator’s excessive reli- 
ance upon inadequate folk models of behavior, and upon human reactions 
to failure. Fortunately, he outlines a possible remedy “in the form of steps 
investigators can take to reconstruct the unfolding mindset of the people 
they are investigating, in parallel and tight connection with how the world 
was evolving around these people at the time.” 

Clearly, accident investigators need help in making sense of human factors 
data. But I am not sure that Olympian pronouncements (or even Sinaian tab- 
lets) are the way to provide it, nor am I convinced that investigators can ever 
"reconstruct the unfolding mindset of the people they are investigating’—I 
can't even construct my present mindset with any confidence. This book, on 
the other hand, delivers the goods in a way that is both useful and meaning- 
ful to hard-pressed accident investigators with limited resources. It is well 
written, well researched, extremely well informed, and offers its guidance 
in a down-to-earth, practical, and modular form (i.e, it can be read via the 
contents page and index rather than from cover to cover). It is just the thing, 
in fact, to assist real people doing a vital job. And, as far as I know, there is 
nothing else like it in the bookshops. 


James Reason 
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I have seen many changes in the understanding of human error as well as 
in the role that it plays in accident causation in the 15 years since this book 
was first published. In that time, considerable research has been conducted 
in such areas as automation, team performance, safety management, and 
fatigue, research that has given investigators additional knowledge with 
which to assess the causes of human error. In this interval, we have also 
witnessed a worldwide decline in major aircraft accidents. Unfortunately, 
some of the accidents that have occurred since then appear to have been 
influenced by the same antecedents to error that we have seen all too fre- 
quently over the years. For example, the accident used in the case study in 
Chapter 16, while more current and with more complex errors than was true 
of the case study in the first edition, illustrates automation-related errors that 
are almost identical to those seen in previous accidents, including one com- 
mitted almost 30 years earlier. Because people and the systems they operate 
do not always learn from their mistakes, the need for thorough and system- 
atic human factors investigations of error becomes that much more critical. 
Hopefully, the lessons to be learned from these investigations can be used to 
avoid further accidents. 

Not only have operators continued to make errors that have led to acci- 
dents, but mishaps in which the antecedents were well known but ignored 
have occurred in other systems as well. For example, the case of Bernard 
Madoff, whose Ponzi scheme cost many investors their life savings, illus- 
trates how ineffective oversight can exacerbate system errors. The U.S. regu- 
lator of financial securities had been informed of the Madoff Ponzi scheme 
well before the scheme was exposed, yet nothing was done to stop him, 
despite its own (flawed) investigation and the presence of publicly available 
information that could have pointed out the fraud. The regulator did not 
cause the scheme, but by failing to properly oversee the financial system 
in which it operated, it contributed to losses of millions of investor dollars 
beyond what would have been the case had it acted effectively when it ini- 
tially learned of the scheme. 

In the years since the first edition, we have also witnessed the world’s 
third major civilian nuclear reactor accident, the March 2011 meltdown in 
the Fukushima Daiichi nuclear generating plant in Japan. A reactor core 
melted after coolant ceased flowing to it, following flooding of the backup 
diesel generators, a result of a devastating tsunami. Because the plant was 
located in a seismic zone, near the ocean, it was potentially prone to tsu- 
namis. Regulators therefore required protection against them. In addition 
to building a seawall, designers had installed backup generators to enable 
coolant to be pumped to the core in the event that primary power was lost. 
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However, although backup generators were required and installed, design- 
ers and regulators failed to consider the possibility of a tsunami of sufficient 
magnitude that would exceed the seawall limits and flood the backup gen- 
erators, which had been placed at ground level behind the seawall. In this 
manner, designers, regulators, operators, and all who play integral roles in 
complex systems, have continued to create antecedents to errors that appear, 
in hindsight, to have been preventable. 

While this book does not attempt to provide foresight to those design- 
ing or operating complex systems, I do hope that it will provide knowledge 
needed to effectively investigate the results of their errors to identify their 
antecedents. Because of the findings of both accident investigations and 
of the human factors research conducted in the interim, we have a better 
understanding of error causation than we did 15 years ago. We know more, 
for example, than we did then about automation’s effects on operators, on 
how fatigue can adversely affect cognitive performance, and how organi- 
zations can contribute to operator errors and the research cited in this text 
reflects these advances. 

My experience as an investigator, with now over 30 years of conduct- 
ing error investigations in major modes of transportation, has reinforced 
my belief that error investigators need to be aware of basic human factors 
research findings. The ability to identify necessary data, along with inter- 
viewing and analytical skills, are all necessary. But without an understand- 
ing of human error even the most skilled investigators will have difficulty 
explicating the error causation in the accidents they investigate. 

In the first edition, I suggested that accident investigation is a special call- 
ing and my experience since then has only reinforced that view. To be able to 
understand and identify errors in a way that is constructive, and that can be 
used to prevent accidents, is indeed a privilege. I hope that you will find this 
text helpful and contributing to your own endeavors. Should you investigate 
an accident, I hope that you will make a positive contribution to safety by 
helping to reduce its likelihood in the future. 

Finally, this book is dedicated to the memory of Howard B. Brandon, Jr., 
whose father has been a colleague, mentor, and friend. May Howard rest in 
peace. 


Barry Strauch 
Annandale, Virginia 
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From the time I was a boy growing up in Brooklyn, I have been fascinated 
with New York’s subway system. When I was 11, I began a tradition that 
lasted 3 years. To celebrate the last day of school, I would ride at the head end 
of a subway train on a route that I had not taken before. I never told my par- 
ents. I doubt they would have understood. I loved the subways and riding 
at the very front of the train allowed me to see not only the track ahead but 
also to watch the operator, then called the motorman. I would stand there for 
hours, fascinated watching the train’s movements and the train operator as 
he would move the train forward and then slow it down and stop it at each 
station. 

From these beginnings, my interest in complex systems and especially 
transportation systems has grown. I later became fascinated with another 
system, aviation, and I tried to learn as much as I could about that field. After 
completing graduate school, I indulged myself by learning to fly. I became 
hooked. All of my free time and disposable income went to pay for lessons 
and flight time. After several years, I was fortunate that I could afford to 
acquire several pilot ratings. I even briefly considered trying to become an 
airline pilot. However, the airlines weren't hiring many pilots in those days 
and I had to enter the field another way. I became an accident investigator 
with the National Transportation Safety Board (NTSB). 

I joined the NTSB as a human performance investigator in 1983, with sev- 
eral other young human factors professionals. We were among the first at 
the agency, or anywhere for that matter, to systematically examine the role of 
operator error in accidents in complex systems. They wanted us to provide 
more insight into the cause of an accident than to attribute it solely to opera- 
tor error, the standard practice of the day. 

The NTSB was, and is, a special place. Its investigators are thoroughly 
dedicated to its mission—to learn what causes an accident in order to pre- 
vent future accidents. Often at considerable personal sacrifice, they travel 
to inhospitable locales and work under great stress, to get to the bottom of 
terrible tragedies. 

In those days, there wasn’t much to guide us beyond the standard 
human factors design texts. Researchers at NASA Ames had been actively 
engaged in studying team errors and crew resource management for sev- 
eral years, but the fruits of their efforts would still be several years away. 
The Danish researcher Jens Rasmussen, and the British researchers James 
Reason, Neville Moray, and their colleagues in Europe were just beginning 
to examine error as a systems construct, after the nuclear accident at Three 
Mile Island. Elsewhere, the field of human error was only just beginning to 
emerge as a field worthy of extensive study in and of itself. 
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Much has happened to the field of human error since 1983, and to me as 
well. I have held a variety of positions at the NTSB, all related to either inves- 
tigating or training others to investigate error, in both the United States and 
abroad. I have met many involved in transportation safety, all as dedicated 
and committed as my colleagues at the NTSB. But many have asked the same 
question—given the prominence of human error in the cause of accidents, is 
there anything written on how to investigate error? Unfortunately, I would 
have to answer that, although there was much written on error, little was 
available to explain how to investigate it. 

I wrote this text to remedy that situation. I have based it not on any formal 
method that the NTSB has adopted, but on my own reading, experience, and 
belief in what works. It is intended for those who are interested in human 
error and for those who investigate errors in the course of an incident or 
accident investigation.” 

I am indebted to many people who have helped me along the way and 
without whose help this text would not have been possible. Although I can- 
not name them all, I would like to thank several whose assistance was invalu- 
able. Dr. Michael Walker, of the Australian Transportation Safety Bureau, 
commented on the organization of the text when it was still in its forma- 
tive stage. Drs. Evan Byrne and Bart Elias of the National Transportation 
Safety Board provided beneficial comments and suggestions on an early 
draft. Dr. Douglas Wiegmann, of the University of Illinois, took time out 
from his schedule to review a draft and his comments are greatly appreci- 
ated. The questions that Dr. John Stoop, of the Delft University of Technology 
in the Netherlands, raised were incisive and helped guide my thinking on 
subsequent drafts. Dr. Mitchell Garber, the medical officer of the National 
Transportation Safety Board, meticulously read and offered suggestions on 
several drafts. His guidance went well beyond medical and human factors 
issues and greatly improved both the content and structure of the text. My 
editor, Ms. Joanne Sanders-Reio, worked with me to arrange my thoughts 
and more important, helped to refine and organize the text. Carol Horgan 
reviewed the final draft for clarity. My publisher, John Hindley provided 
ongoing support and encouragement from the beginning. Professor James 
Reason provided invaluable encouragement in these efforts. 

I am especially indebted to my wife Maureen, my son Sean, and my 
daughter Tracy. They have put up with the over three and half years that I 
have spent on this project, with the attendant absences from their lives and 
frustrations these efforts produced. Without their patience and encourage- 
ment this book would not have been possible. 





* The text reflects my views and opinions, and not necessarily those of the National 
Transportation Safety Board. 
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Introduction 











The ValuJet accident continues to raise troubling questions—no longer 
about what happened but about why it happened, and what is to keep 
something similar from happening in the future. As these questions 
lead into the complicated and human core of flight safety, they become 
increasingly difficult to answer. 


Langewiesche, 1998 
The Atlantic Monthly 





Introduction 


“To err is human” it is said, and people make mistakes—it is part of the 
human condition. When people err, they may be embarrassed or angry with 
themselves, but most often the errors are minor and little attention is paid 
to the consequences. However, sometimes the errors lead to more serious 
consequences. Occasionally, people working in hospitals, airlines, power sta- 
tions, chemical refineries, or similar settings commit errors—errors that may 
cause accidents with catastrophic consequences, potentially leading to injury 
or death to those who played no part in the error. 

Such work settings, known as “complex systems” (Perrow, 1999), generate 
electricity, refine crude oil, manage air traffic, transport products and people, 
and treat the sick, to name a few. They have brought substantial benefits 
to our way of life, and permitted a standard of living to which many have 
become accustomed, but when someone who works in these systems makes 
an error, the consequences may be severe. Although companies and their 
regulators typically establish extensive performance standards to prevent 
errors, these errors, which in other environments may be inconsequential, 
can, in these settings, result in severe consequences. 

A new catastrophe seems to occur somewhere in the world with regular- 
ity, often one that is later attributed to someone doing something wrong. 
Whether it is an airplane accident, a train derailment, a tanker grounding, 
or any of the myriad events that seem to occur with regularity, the tendency 
of often simple errors to wreak havoc continues. Despite the progress made, 


2 Investigating Human Error 


systems have not yet been developed that are immune to the errors of those 
who operate them. The human genetic structure has been mapped, the 
Internet developed, and cell phones designed with more computing power 
than most computers had but a few short years ago, but human error has not 
yet been eliminated from complex systems. 

However, while error has not been eliminated, our understanding of the 
causes of errors has increased. Particularly in complex systems where there 
is little tolerance for errors, regulators, system designers, and operators have 
developed and implemented techniques that anticipate and address poten- 
tial opportunities for error and it is hoped, prevent errors from being com- 
mitted that can jeopardize system safety. 





The Crash of ValuJet Flight 592 


To illustrate how even simple errors can lead to a catastrophic accident, let us 
look at an event in one of our safest complex systems—commercial air trans- 
portation. Despite numerous measures that had been developed to prevent 
the very types of errors that occurred, several people, including some who 
were not even involved in the conduct of the accident flight, committed criti- 
cal errors that led to an accident. 

On May 11, 1996, just minutes after it had taken off from nearby Miami, 
Florida, a McDonnell Douglas DC-9 crashed into the Florida Everglades 
(National Transportation Safety Board, 1997). Investigators determined 
that the cause of the accident was relatively simple and straightforward; 
an intense fire broke out in the airplane’s cargo compartment and within 
minutes burned through the compartment into the cabin, quickly spreading 
through the cabin. The pilots were unable to land before the fire degraded 
the airplane’s structural integrity. All onboard were killed in the accident 
(Figure 1.1). 

The investigation led to considerable worldwide media attention. As with 
any large-scale event involving a substantial loss of life, this was under- 
standable. But other factors played a part as well. The airline had been 
operating for less than 3 years, and it had employed what were then nontra- 
ditional airline practices. It had expanded rapidly, and in the months before 
the accident experienced two nonfatal accidents. After this accident, many 
criticized the airline, questioning its management practices and its safety 
record. Government officials initially defended the airline’s practices, but 
then reversed themselves. Just over a month after the accident, government 
regulators, citing deficiencies in the airline’s operations, forced it to suspend 
operations until it could satisfy their demands for reform. This led to even 
more media attention. 
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FIGURE 1.1 
The ValuJet accident site in the Florida Everglades. (Courtesy of the National Transportation 
Safety Board, 1997) 


As details about the crash emerged and more was learned, the scope of the 
tragedy increased. Minutes after takeoff, the pilots had declared an emer- 
gency, describing smoke in the cockpit. Within days investigators learned 
that despite strict prohibitions, canisters of chemical oxygen generators had 
been loaded onto the aircraft. It was believed that the canisters, the report of 
smoke in the cockpit, and the accident were related. 

Oxygen generators provide oxygen to airline passengers in the event of a 
cabin depressurization and are therefore designed to be safely transported 
in aircraft, provided the canisters are properly installed within protective 
housings. However, if the canisters are not packaged properly, or are shipped 
without locks to prevent initiation of oxygen generation, they could inadver- 
tently generate oxygen. The process creates heat as a by-product, bringing 
the surface temperature of the canisters to as high as 500°F (260°C). 

Investigators believed that boxes of canisters that lacked locks or other 
protection were placed loosely in boxes and loaded into the airplane's cargo 
hold underneath the cabin. After being jostled during takeoff and climb out, 
the canisters began generating oxygen. The canister surfaces became heated 
to the point that adjacent material in the cargo compartment was ignited 
and a fire began. The canisters then fed the fire with pure oxygen, produc- 
ing one of extraordinary intensity that quickly penetrated the fire resistant 
material lining the cargo hold, material that had not been designed to protect 
against an oxygen-fed fire. The fire burned through the cabin floor and, with 
the pure oxygen continuing to feed it, grew to the point where the struc- 
ture weakened and the airplane become uncontrollable. It crashed into the 
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FIGURE 1.2 
Unexpended, unburned chemical oxygen generator, locking cap in place, but open. (Courtesy 
of the National Transportation Safety Board, 1997) 


Everglades, a body of shallow water, becoming submerged under its soft silt 
floor (Figure 1.2). 

Because of the potential danger that unprotected oxygen generators pose, 
they are considered hazardous and airlines are prohibited from loading 
unexpended and unprotected canisters of oxygen generators onto aircraft. 
Yet, after the accident, it was clear that someone had placed the canisters on 
the airplane. As a result, a major focus of the investigation emerged to deter- 
mine how and why the canisters were loaded onto the airplane. 

Investigators learned that no single error led to loading the canisters onto 
the aircraft. To the contrary, about 2 months before the accident, several indi- 
viduals committed relatively insignificant errors, in a particular sequence. 
Each error, in itself, was seemingly minor—the type that people may commit 
when rushed, for example. Rarely do these errors cause catastrophic conse- 
quences. However in this accident, despite government-approved standards 
and procedures designed and implemented to prevent them, people still 
committed critical errors that resulted in a maintenance technician shipping 
three boxes of unexpended oxygen generators on the accident airplane. 

Although the errors may have appeared insignificant, a complex system 
such as commercial aviation has little room for even insignificant errors. 
Investigators seeking to identify the errors to determine their role in the 
cause of the accident faced multiple challenges. Many specialists had to 
methodically gather and examine a vast amount of information, then ana- 
lyze it to identify the critical errors, the persons who committed them, and 
the context in which the errors occurred. 
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It took substantial effort to understand the nature of the errors that led to 
this accident, and investigators succeeded in learning how the errors were 
committed. The benefits of their activities were as substantial. By meticu- 
lously collecting and analyzing the necessary data, investigators were able to 
learn what happened and why—information that managers and regulators 
then applied to system operations to make them safer. Many learned lessons 
from this accident, and they applied what they learned to their own opera- 
tions. While the tragedy of the accident cannot be diminished, it made the 
aviation industry a safer one; it has not witnessed a similar type of accident. 
This is the hope that guides error investigations, that circumstances similar 
to the event being investigated will not recur and that those facing the same 
circumstances will not repeat the errors made earlier. 





Investigating Error 


Today, in many industrialized countries, government agencies or commis- 
sions generally investigate major incidents and accidents. Some countries 
have established agencies that are dedicated to that purpose. For example, the 
National Transportation Safety Board in the United States, the Transportation 
Safety Board of Canada, and the Australian Transport Safety Bureau, inves- 
tigate incidents and accidents across transportation modes in their respec- 
tive countries. In other countries, government agencies investigate accidents 
in selected transportation modes, such as the Air Accidents Investigation 
Branch of Great Britain and the BEA (Bureau d'Enquétes et d'Analyses pour 
la sécurité de l'aviation civile) of France, which investigate commercial avia- 
tion accidents and incidents. 

However, when relatively minor accidents or incidents occur, organiza- 
tions with little, if any, experience may need to conduct the investigations 
themselves. Without the proper understanding, those investigating error 
may apply investigative procedures incorrectly or fail to recognize how the 
error came about. Although researchers have extensively examined error 
(e.g., Reason, 1990, 1997; Woods, Johannesen, Cook, and Sarter, 1994), there is 
little available to guide those wishing to investigate error. Despite the many 
accidents and incidents that are caused by operator error, it appears that few 
know a formal process to investigate errors or how to apply such a method 
during the course of an investigation. 

This book presents a method of investigating errors believed to have led 
to an accident or incident. It can be applied to error investigations in any 
complex system, although most of the examples presented are aviation 
related. This primarily reflects the long tradition and experience of agencies 
that investigate aviation accidents, and the author's experience participating 
in such investigations. Please consider the examples presented as tools to 
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illustrate points made in the book and not as reflections on the susceptibility 
of any one system or transportation mode to incidents or accidents. Neither 
the nature of the errors nor the process of investigating errors differs sub- 
stantially among systems. 

This book is designed for practitioners and investigators, as well as for stu- 
dents of error. It is intended to serve as a roadmap to those with little or no 
experience in human factors or in conducting error investigations. Though 
formal training in human factors, psychology, or ergonomics, or experience 
in formal investigative methodology is helpful, it is not required. The ability 
to understand and effectively apply an investigative discipline to the process 
is as important as formal training and experience. 

Chapters begin with reviews of the literature and, where appropriate, fol- 
low with explicit techniques on documenting data specific to the discussion 
in that chapter. Most chapters also end with “helpful techniques,” designed 
to serve as quick investigative references. 





Outline of the Book 


The book is divided into five sections, each addressing a different aspect 
of error in complex systems. Section I defines concepts that are basic to the 
book, errors and complex systems, Section II focuses on types of antecedents 
to error, Section III describes data sources and analysis techniques, Section IV 
discusses three contemporary issues in human error, and Section V reviews 
an accident in detail and presents thoughts on selected issues important to 
error investigations. 

Chapter 2 defines error in complex systems and introduces such critical 
concepts as operator, incident, accident, and investigation. Contemporary 
error theories are discussed, with particular attention devoted to Perrow’s 
description of system accidents (1999) and Moray (2000) and Reason's (1990, 
1997), models of error in complex systems. Changes in views of error over the 
years are discussed. 

Chapter 3 discusses the analysis of data obtained in a human error inves- 
tigation. Different types of analyses are described and their relationship to 
human error explored. A hypothetical illustration of the application of the 
analysis methodology to an accident involving human error is presented, 
with the logic involved in each of the steps examined. 

Chapter 4 begins the focus on antecedents to error by examining the role 
of equipment in creating error antecedents, the source of much of the early 
scientific work in the field of human factors. Information display and control 
features that affect operator performance are discussed and illustrations of 
their relationship to operator errors in selected accidents are presented. 
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Chapter 5 discusses antecedents pertaining to the system operator, his- 
torically the primary focus of those investigating error. Behavioral and 
physiological antecedents to error are examined, and antecedents that are 
operator-initiated or caused are differentiated from company-influenced 
antecedents. 

Chapter 6 reviews antecedents pertaining to companies that operate com- 
plex systems. These antecedents incorporate many that are discussed in ear- 
lier chapters, including operating procedures and company oversight of the 
application of those procedures to system operations. 

Chapter 7 examines antecedents related to regulators. It discusses the 
importance of regulators in both creating the rules under which complex sys- 
tems operate, and enforcing those rules to insure safe operation. Instances of 
lax regulation in which the regulator created antecedents to organizational 
errors in a variety of settings, including the financial sector, are discussed. 

Chapter 8 assesses the impact of culture on error. Two types of culture, 
national and company related, are examined. Although they are distinct 
in terms of their relationship to antecedents, they share characteristics that 
influence operator performance. Several accidents, which illustrate the types 
of antecedents that can arise from cultural factors, are reviewed. 

Chapter 9 reviews operator teams and error antecedents that are unique 
to teams. The complexities of contemporary systems often call for operator 
teams with diverse skills to operate the systems. System features that neces- 
sitate the use of operator teams, the errors that members of these teams could 
commit, and their antecedents, are examined. 

Chapter 10 addresses the first of the data sources investigators rely on, 
electronic data that system recorders capture and record. Types of recorders 
used in different systems are examined and their contribution to the inves- 
tigation of error in those systems discussed. A recent accident is presented 
to illustrate how recorded data can provide a comprehensive view of the 
system state and an understanding of the errors leading to an accident. 

Chapter 11 discusses written documentation, an additional data source 
for investigators. Documentation critical to investigations including records 
that companies and government agencies maintain, such as medical and 
personnel records, and factors that affect the quality of that information, are 
discussed. Several accidents are reviewed to illustrate how written docu- 
mentation can help investigators understand both the errors that may have 
led to events in complex systems and their antecedents. 

Chapter 12 focuses on a third type of data for investigators, interview 
data, and their use in error investigations. Memory and memory errors are 
reviewed, and their effects on interviewee recall discussed. Types of inter- 
viewees are discussed and the factors pertaining to each, such as the type of 
information expected, the interview location, and the time since the event, 
examined. Suggestions to enhance interview quality and maximize the 
information they can provide are offered. 
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Chapter 13 begins Section IV of the book, contemporary issues in error in 
complex systems. This chapter examines antecedents that are exclusive to the 
maintenance and inspection environment. With the exceptions of Reason and 
Hobbs (2003) and Drury (1998), researchers have generally paid little attention to 
understanding maintenance and inspection errors. Antecedents to these errors 
include environmental factors, tool design, the tasks themselves, and other fac- 
tors related to the distinctive demands of system maintenance and inspection. 

Chapter 14 reviews situation awareness and decision making, and their rela- 
tionship to system safety. Factors that can influence situation awareness are 
discussed, many of which are also reviewed as error antecedents elsewhere 
in the book. The relationship of situation awareness to decision making is out- 
lined. Two models of decision making are reviewed, classical decision making, 
applied to relatively static domains and naturalistic decision making, employed 
in dynamic environments. A case study involving a critical decision-making 
error is presented to illustrate the role of decision making in system safety. 

Chapter 15 examines a third issue in error, automation, a subject that has 
received considerable attention in the literature on error and complex sys- 
tems, and in accident investigations. Automated systems have introduced 
unique antecedents. Their effects on operator performance in an accident 
involving a marine vessel are examined. 

Chapter 16 begins the section that reviews issues previously discussed in 
the book. It focuses on an accident in detail to illustrate many of the concepts 
and methodology presented throughout the book. An automation-related 
accident involving a Boeing 777, in which a series of interacting antecedents 
led to a basic and rather simple operator error, is detailed. The roles of the 
manufacturer, the company, and the regulator are examined in detail. 

In the final Chapter 17, goals outlined in the first chapter are reexamined. 
Major principles of human error investigation, as discussed in earlier chap- 
ters are reviewed, and ways that investigations into error can be used proac- 
tively to enhance system safety, suggested. 

Each chapter is meant to stand alone, so that those interested in a spe- 
cific issue or technique can readily refer to the section of interest. The chap- 
ters may also be read out of sequence if desired. Nonetheless, reading them 
sequentially will provide a logical overview of the literature and the field 
itself. It is hoped that by the end of the book the reader will feel confident to 
effectively investigate error in a complex system. 


E E 
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Errors, Complex Systems, 
Accidents, and Investigations 











Patient accident reconstruction reveals the banality and triviality behind 
most catastrophes. 


Perrow, 1999, p. 9 
Normal Accidents 





Operators and Complex Systems 


There have been extraordinary changes in the machines that affect our 
daily lives. The equipment has become more complex, more sophisticated 
and more automated, while becoming more central to our activities. In 
commercial aviation, for example, two pilots were needed to fly the first 
commercially successful air transport aircraft, the Douglas DC-3, an air- 
craft that was designed over 80 years ago. The DC-3 could carry about 20 
passengers at a speed of about 200 miles an hour over several hundred 
miles. Today, two pilots are also needed to operate a passenger-carrying 
aircraft, the Airbus A-380, but this aircraft transports over 500 passengers, 
several thousand miles, at speeds in excess of 500 miles an hour. Although 
the acquisition and operating costs of the A-380 are many times those of 
its predecessor, the per-seat operating costs are lower. This has helped 
to make air transportation affordable to many more people than in the 
DC-3 era. 

Yet, there is a price that is paid for these technological advances. While the 
cost of travel has gone down substantially since the DC-3 era because mod- 
ern aircraft transport more people at lower cost than previously, more people 
are also exposed to the consequences of operator errors than was true of the 
earlier era. Accidents that occurred a century ago, such as ship fires, exposed 
relatively fewer people to risk whereas today thousands have been lost in 
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single events, such as the 1987 sinking of a ferry in the Philippines, or in the 
1984 chemical accident in Bhopal, India.* 


Complex Systems 


People work with machines routinely and when they do they are machine 
operators. Whether operating lawn mowers, automobiles, tablets, or power 
saws, people use machines to perform tasks that they either cannot do them- 
selves, or can perform more quickly, accurately, or economically with the 
machines. Together the operator and the machine form a system in which 
each is a critical and essential system component. As Chapanis (1996) defines, 


A system is an interacting combination, at any level of complexity, of 
people, materials, tools, machines, software, facilities, and procedures 
designed to work together for some common purpose. (p. 22) 


Complex systems, which employ machines that require multiple operators 
with extensive training, support our way of life. They provide clean water 
and sewage treatment, electrical power, and facilitate global finance, to name 
but a few. These systems, considerably more sophisticated than, say a person 
operating a lawn mower, have become so integral to our daily activities that 
in the event they fail whole economies can be threatened. 

However, as Perrow (1999) notes, the complexity of such systems has 
increased inordinately. 


We have produced designs so complicated that we cannot anticipate all 
the possible interactions of the inevitable failures; we add safety devices 
that are deceived or avoided or defeated by hidden paths in the systems. 
The systems have become more complicated because either they are deal- 
ing with more deadly substances, or we demand they function in ever 
more hostile environments or with ever greater speed and volume. (p. 12) 


As our dependence on systems increases, more is asked of them, and with 
their increasing technical capabilities we have witnessed increased com- 
plexity. Complex systems are more than merely operators and equipment 
working together, they are entities that typically perform numerous tasks of 
considerable import to both companies and individuals. 

Although complex systems need not necessarily be high-risk systems, that 
is, systems in which the consequences of failure can be catastrophic, many 
authors apply the terms interchangeably. Systems that are sufficiently com- 
plex are often high-risk systems, if for no other reason than because so many 
people depend on them and thus interruptions from service can dramatically 





* On December 29, 1987, the ferry Dona Paz sank off the coast of Manila killing 4235 people. On 
December 3, 1984, a gas leak at Union Carbide’s chemical processing plant in Bhopal, India, 
killed an estimated 3800 people and injured thousands more. 
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affect our lives. Nonetheless, while the focus of this book is on complex sys- 
tems, the methodology to investigate human error described can be readily 
applied to simple systems as well—even to the system in which one person 
operates a lawn mower. 


Operators 


Operators interact with and control complex systems, and consequently 
play a central role in system safety. Despite the diversity of skills they need, 
equipment used, and settings in which they operate, one term can be used 
to describe them. While some have used terms such as “actor,” “technician,” 
“pilot,” “controller,” and “worker,” the term operator will be used presently. 
In reference to maintenance activities, the terms technician and inspector 
will be used, as appropriate. 

Whether it is a financial, air transport, or electrical generating system, 
operators essentially perform two functions: they monitor the system and 
they control its operations. To do so, they obtain information from the system 
and its operating environment, using their knowledge and experience, with 
the information, to understand the system state. Based on their understand- 
ing of the system, they modify operations, as needed, according to opera- 
tional phase and the system-related information they perceive. 

Because of the potential severity of the consequences of error in complex 
systems, operators are expected to be skilled and qualified. They are the first 
line of defense in trying to limit the effects of system anomalies from becom- 
ing catastrophic. However, operators sometimes precipitate rather than pre- 
vent system incidents or accidents. 


Normal Accidents and Complex Systems 


The changes that have taken place over time in the complexity of these sys- 
tems have fundamentally altered the relationship between operators and the 
machines they control. Once directly controlling the machines, operators 
now largely supervise their operations. These tasks are typically performed 
at a higher cognitive and a lower physical level than was true of operators of 
earlier times who largely controlled the machines manually. 

Charles Perrow (1999) suggests that complex systems have changed to the 
extent that “interactive complexity” and “tight coupling” have made “nor- 
mal accidents” inevitable. That is, as systems have become more efficient, 
powerful, and diverse in the tasks they perform, the consequences of sys- 
tem failures have grown. In response, designers have increased the number 
of defenses against system malfunctions and operator errors, thus increas- 
ing internal system complexity. At the same time, systems have become 
tightly coupled, so that processes occur in strict, time-dependent sequences, 
with little tolerance for variability. Should a component or subsystem expe- 
rience even a minor failure, little or no “slack” would be available within 
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the system, and the entire process could be impacted. The combination of 
increased complexity and tight coupling has created system states that nei- 
ther designers nor operators had anticipated. 

Perrow suggests that unanticipated events in tightly coupled and highly 
complex systems will inevitably lead to accidents. As he explains, 


If interactive complexity and tight coupling—system characteristics— 
inevitably will produce an accident, I believe we are justified in calling 
it anormal accident, or a system accident. The odd term normal accident 
is meant to signal that, given the system characteristics, multiple and 
unexpected interactions of failures are inevitable. (p. 5) 


It seems difficult to accept that fundamental characteristics of complex 
systems have made catastrophic accidents “normal.” Perrow, however, has 
greatly influenced how incidents and accidents in complex systems are con- 
sidered by focusing not on the operator as the cause of an accident or inci- 
dent but on the system itself and its design. 

James Reason (1990, 1997), the British human factors researcher, expanded 
on Perrow’s theory by focusing on the manner in which system operation 
as well as system design can lead to errors. He suggests that two kinds of 
accidents occur in complex systems: one results from the actions of people, 
which he terms “individual accidents,” and the other “organizational acci- 
dents,” which results largely from the actions of companies and their man- 
agers. Reason’s (1997) description of organizational accidents has much in 
common with Perrow’s normal accidents, 


These [organizational accidents] are the comparatively rare, but often 
catastrophic, events that occur within complex modern technologies 
such as nuclear power plants, commercial aviation, the petrochemi- 
cal industry, chemical process plants, marine and rail transport, banks 
and stadiums. Organizational accidents have multiple causes involving 
many people operating at different levels of their respective companies. 
Organizational accidents...can have devastating effects on uninvolved 
populations, assets and the environment. (p. 1) 


Both Reason and Perrow suggest that, given changes in the nature and 
function of these systems, new and largely unanticipated opportunities for 
human error have been created. 

Vicente (1999) elaborates on the work of Reason and Perrow and identi- 
fies elements of what he refers to as “sociotechnical systems,” which have 
increased the demands on system operators. These include the social needs 
and different perspectives of team members that often operate complex sys- 
tems, the increasing distance among operators and between operators and 
equipment, the dynamic nature of systems, increasing system automation, 
and uncertain data. By escalating the demands on operators, each element 
has increased the pressure on them to perform without error. 
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Human fallibilities being what they are, there will always be a possibil- 
ity that an operator will commit an error, and that the consequences of even 
“minor” errors will present a threat to the safety of complex systems. Some, 
such as Senders and Moray (1991), Hollnagel (1993), and Reason (1997), suggest 
that the impossibility of eliminating operator error should be recognized, by 
focusing not on error but instead on minimizing the consequences of errors. 





Human Error 


Most errors are insignificant and quickly forgotten. The relatively minor 
consequences of most human errors justify the relative inattention we pay 
them. Some circumstances even call for errors, such as when learning new 
skills. Children who learn to ride bicycles are expected to make numerous 
errors initially, but fewer errors as they become more proficient, until they 
reach the point of riding without error. Designers and training professionals, 
recognizing the value of errors in learning environments, have developed 
system simulators that enable operators to be trained in operating systems 
in realistic environments, free of the consequences of error. 

People require feedback after they have erred; without it, they may not 
even realize that they have committed errors. Someone who forgets to 
deposit money into a checking account may continue to write checks with- 
out recognizing that the account lacks sufficient funds. That person would 
not likely be considered to be committing an error each time he or she wrote 
a check. Rather, most would consider the person to have committed only one 
error—the initial failure to deposit funds into the account. 

It should be apparent that the nature of errors and the interpretation and 
determination of their significance are largely contextual. Turning a crank 
the wrong way to close an automobile window is a minor error that would 
probably be quickly forgotten. On the other hand, turning a knob in the 
control room of a nuclear power plant in the wrong direction can lead to a 
nuclear accident. Both errors are similar—relatively simple acts of rotating a 
control in the wrong direction—yet under certain conditions an otherwise 
minor error can cause catastrophic consequences. What ultimately differen- 
tiates errors are their contexts and the relative severity of their consequences. 





Theories of Error 


Modern error theory suggests that in complex systems, operator errors are 
the logical consequences of antecedents or precursors that had been present 
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in the systems. Theorists have not always considered system antecedents to 
play as large a role in error causation as is considered today. 


Freud 


Freud and his students believe that error is a product of the unconscious 
drives of the person (e.g., Brenner, 1964). Those who erred are considered 
less effective and possibly more deficient than those who do not, an inter- 
pretation that has had wide influence on theories of error and on subsequent 
research. For example, the concept of “accident proneness,” influenced by 
Freud’s view of error, attributed to certain people a greater likelihood of 
committing errors than to others because of their personal traits. However, 
studies (e.g., Rodgers and Blanchard, 1993; Lawton and Parker, 1998) have 
found serious methodological deficiencies in the initial studies upon which 
much of the later assumptions about error proneness had been based. For 
example, the failure to control the rates of exposure to risk minimized the 
applicability of conclusions derived. Lawton and Parker conclude, “...it 
proved impossible to produce an overall stable profile of the accident-prone 
individual or to determine whether someone had an accident-prone person- 
ality” (p. 656). The application of Freud’s theories (he used multiple theories 
to explain human behavior) outside of clinical settings has largely fallen into 
disfavor as both behavioral and cognitive psychological theories have gained 
increasing acceptance. Unlike Freud, error theorists since his day consider 
the setting in which errors are committed when examining error to be far 
more important than the characteristics of the person committing the error. 


Heinrich 


Heinrich (1931, 1941) was among the first to systematically study accident 
causation in industrial settings. He suggested that incidents and accidents 
can be prevented by breaking the causal link in the sequence or chain of 
events that led up to them. Focusing on occupational injuries, that is job- 
related injuries, he suggested that accidents result from a sequence of events 
involving people’s interactions with machines. One step leads to others in a 
fixed and logical order, much as a falling domino causes subsequent stand- 
ing dominoes to fall, ultimately leading to an incident or accident. Heinrich 
suggested that incidents and accidents form a triangle or pyramid of fre- 
quency, with non-injury incidents, which occur the least often, located at 
the bottom of the pyramid, incidents with minor injuries, which occur more 
often than non-injury incidents, at the middle of the pyramid, and accidents 
with serious injuries, which occur the least often, at the top of the pyramid. 
To Heinrich, two critical underlying factors leading to accidents were per- 
sonal or mechanical hazards resulting from carelessness and poorly designed 
or improperly maintained equipment. Carelessness and other “faults” were, 
to Heinrich, the result of environmental influences, that is, the environment 
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in which people were raised, or traits that they inherited. Heinrich's work, 
with its systematic study of accident causation, had considerable influence on 
our view of accident causation. Coury, Ellingstad, and Kolly (2010) wrote that 
as a result of Heinrich's work, many have come to view accident causation as 
a series of links in a chain, to be prevented by breaking the link or sequence 
of events. 


Norman 


Norman (1981, 1988) studied both cognitive and motor errors and differenti- 
ated between two types of errors: slips and mistakes. Slips are action errors 
or errors of execution that are triggered by schemas, a person's organized 
knowledge, memories, and experiences. Slips can result from errors in the 
formation of intents to act, faulty triggering of schemas, or mental images of 
phenomena, among other factors. He categorized six types of slips, exempli- 
fied by such relatively minor errors as striking the wrong key on a computer 
keyboard, pouring coffee into the cereal bowl instead of the cup adjacent to 
the bowl, and speaking a word other than the one intended. 

Mistakes are errors of thought in which a person's cognitive activities lead 
to actions or decisions that are contrary to what was intended. To Norman, 
slips are errors that logically result from the combination of environmen- 
tal triggers and schemas. Applying the lessons of slips to design, such as 
standardizing the direction of rotation of window cranks in automobiles, 
would, to Norman, reduce the number of environmental triggers and there- 
fore reduce the likelihood of slips. 


Rasmussen 


Jens Rasmussen (1983), a Danish researcher, expanded the cognitive aspects 
of error that Norman and others described, by defining three levels of opera- 
tor performance and three types of associated errors: skill-, knowledge-, and 
rule-based. Skill-based performance, the simplest of the three, relies on skills 
that a person acquires overtime and stores in memory. Skill-based perfor- 
mance errors are similar to Norman's slips in that they are largely errors of 
execution. With rule-based performance, more advanced than skill-based, 
operators apply rules to situations that are similar to those that they have 
encountered through experience and training. Rule-based performance 
errors result from the inability to recognize or understand the situations or 
circumstances encountered. This can occur when the information necessary 
to understand the situation is unavailable, or the operator applies the wrong 
rule to unfamiliar circumstances. 

Rasmussen maintains that the highest level of performance is knowledge- 
based. Rather than applying simple motor tasks or rules to situations that 
are similar to those previously encountered, the operator applies previously 
learned information, or information obtained through previous experience, 
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to novel situations to analyze or solve problems associated with those situ- 
ations. Knowledge-based performance errors result primarily from short- 
comings in operator knowledge or limitations in his or her ability to apply 
existing knowledge to new situations. 


Reason 


James Reason (1990) enlarged the focus of earlier definitions of errors and 
further distinguished among basic error types. He defines slips as others 
have—relatively minor errors of execution, but he also identifies an addi- 
tional type of error, a lapse, which he characterizes as primarily a memory 
error. A lapse is less observable thana slip and occurs when a person becomes 
distracted when about to perform a task, or omits a step when attempting to 
complete the task. 

Reason also distinguishes between mistakes and violations. Both are 
errors of intent—mistakes result from inappropriate intentions or incorrect 
diagnoses of situations, violations are actions that are deliberately nonstan- 
dard or contrary to procedures. 

Reason does not necessarily consider violations to be negative. Operators 
often develop violations to accomplish tasks in ways they believe would be 
more efficient than those accomplished by following procedures that design- 
ers and managers developed. By contrast, Reason considers a deliberate act, 
intended to undermine the safety of the system, to be sabotage. 

Reason’s categorization of errors corresponds to Rasmussen’s perfor- 
mance-based errors. Slips and lapses are action errors that involve skill- 
based performance while mistakes involve either rule- or knowledge-based 
performance. 

Reason, however, added to previous error theories by addressing the 
role of designers and company managers in operator errors, that is, those 
who function at the higher levels of system operations, at what he labels 
the “blunt end” of a system. Those at the blunt end commit what he terms 
“latent errors” (but later (1997) referred to as “latent conditions”) within a 
system. Operators, located at the “sharp end” of a system, commit what he 
calls “active errors,” errors that directly lead to accidents. 

Operators’ active errors are influenced, Reason argues, by latent errors 
that those at the blunt end have committed, errors that lie hidden within the 
system. Although active errors lead to consequences that are almost imme- 
diately recognized, the consequences of latent errors may go unnoticed for 
some time, becoming manifest only when a combination of factors weaken 
system defenses against active errors. Designers and managers place internal 
defenses in systems to prevent errors from leading to incidents and accidents 
in recognition of the potential fallibility of human performance. However, 
should the defenses fail when an operator commits an error, catastrophic 
consequences could occur. 
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Reason (1990) uses a medical analogy to explain how latent errors can 
affect complex systems, 


There appear to be similarities between latent failures in complex tech- 
nological systems and resident pathogens in the human body. The resi- 
dent pathogen metaphor emphasises the significance of causal factors 
present in the system before an accident sequence actually begins. All 
man-made systems contain potentially destructive agencies, like the 
pathogens within the human body. At any one time, each complex sys- 
tem will have within it a certain number of latent failures, whose effects 
are not immediately apparent but that can serve both to promote unsafe 
acts and to weaken its defence mechanisms. For the most part, these 
are tolerated, detected and corrected...but every now and again a set 
of external circumstances—called here local triggers—arise that com- 
bines with these resident pathogens in subtle and often unlikely ways to 
thwart the system's defences and to bring about its catastrophic break- 
down. (p. 197) 


Reason illustrates how company-related defenses and resident pathogens 
affect safety by pointing to slices of Swiss cheese that are lined up against 
each other (Figure 2.1). Unforeseen system deficiencies, such as questionable 
managerial and design decisions, precede managers’ actions. These lead to 
“psychological precursors” among operators such as reactions to stress or 
to other aspects of the “human condition,” and to unsafe acts. These rep- 
resent the holes in the Swiss cheese whereas the solid parts of the cheese 
slices represent company defenses against the hazards of unsafe acts. If the 
Swiss cheese slices were placed one against the other, the holes or deficien- 
cies would be unlikely to line up in sequence. Company-related defenses, 
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FIGURE 2.1 
Reason's model of error. (From Reason, J. T. 1997. Aldershot, England: Ashgate. Copyright 
Ashgate Publishing. Reprinted with permission.) 
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the solid portions of the cheese, would block an error from penetrating. 
However, should the deficiencies (holes) line up uniquely, an active error 
could breach the system, much as an object could move through the holes in 
the slices, an unsafe act would not be prevented from affecting the system, 
and an accident could result. 

To Reason, even though managerial and design errors are unlikely to lead 
directly to accidents and incidents, an examination of human error should 
assess the actions and decisions of managers and designers at the blunt end 
at least as much, if not more, than the actions of the system operators at the 
sharp end. His description of the role of both design and company-related or 
managerial antecedents of error has greatly influenced our understanding of 
error, largely because of its simplicity, rationality, and ease of understanding. 
Further, his approach to developing a model to explain error causation was 
also influential. For example, the International Civil Aviation Organization 
(ICAO) has formally adopted Reason's model of error for its member states 
to facilitate their understanding of human factors issues and aviation safety 
(ICAO, 1993). 

Dekker and Pruchnicki (2014) updated Reason’s model, in the light of 
several major accidents and theoretical work that had been conducted 
since his initial work on error was published. Errors in complex systems 
that lead to accidents and incidents, they argue, are often preceded by 
extensive periods, which they refer to as “incubation periods,” in which 
the latent errors of which Reason speaks, or organizational shortcomings, 
gradually increase but remain unrecognized. These shortcomings maybe 
taken for granted, or are unrecognized over time as the risks increase and 
the organization or company gradually “drifts” toward an accident. As 
they note: 


Pressures of scarcity and competition, the intransparency and size of 
complex systems, the patterns of information that surround decision 
makers, and the incremental nature of their decisions over time, all 
enter into the incubation period of future accidents. Incubation hap- 
pens through normal processes of reconciling differential pressures on 
an organisation (efficiency, capacity utilisation, safety) against a back- 
ground of uncertain technology and imperfect knowledge. Incubation 
is about incremental, or small, seemingly insignificant steps eventually 
contributing to extraordinary unforeseen events. (p. 541) 


What Is Error 


Researchers generally agree on the meaning of an error. To Senders and 
Moray (1991), it is “something [that] has been done which was not intended 
by the actor, not desired by a set of rules or an external observer, or that 
led the task or system outside its acceptable limits” (p. 25). Reason (1990) 
sees an error as “a generic term to encompass all those occasions in which a 
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planned sequence of mental or physical activities fails to achieve its intended 
outcome, and when these failures cannot be attributed to the intervention 
of some chance agency” (p. 5). Woods, Johannesen, Cook, and Sarter (1994) 
define error as “a specific variety of human performance that is so clearly and 
significantly substandard and flawed when viewed in retrospect that there is 
no doubt that it should have been viewed by the practitioner as substandard 
at the time the act was committed or omitted” (emphasis in original, p. 2). 

Hollnagel (1993) believes that the term “human error” is too simplistic and 
that “erroneous action” should be used in its place. An erroneous action, he 
explains, “is an action which fails to produce the expected result and which 
therefore leads to an unwanted consequence” (p. 67). He argues that one 
should not make judgments regarding the cause of the event. The term erro- 
neous action, unlike error, implies no judgment and accounts for the context 
in which the action occurs. 

Despite some disagreement in defining error, most researchers agree on 
the fundamental aspects of error, seeing it as the result of something that 
people do or intend to do that leads to outcomes different from what they had 
expected. Therefore, to be consistent with these views, error will be defined 
in this book as an action or decision that results in one or more unintended negative 
outcomes. Errors that occur in learning or training environments, where they 
are expected, tolerated, and used to enhance and enlarge a person’s reper- 
toire of skills and knowledge, will not be considered further. 

For our purposes even though researchers have described multiple types of 
errors, insofar as accident or incident investigations are concerned, only two 
types of errors are important, action errors and decision errors. In an action 
error, an operator does something wrong, such as shuts a system down that 
should have continued in operation, or does something contrary to what had 
been called for by company procedures. Decision errors refer to incorrect 
decisions that operators make, such as misinterpreting weather information 
and proceeding into an area of adverse weather. In general, errors related to 
equipment control design antecedents tend to be action errors. Errors that 
call for interpretation, such as navigation or understanding the meaning of 
multiple alarms, tend to be decision errors. 


Error Taxonomies 


Senders and Moray (1991) developed an error taxonomy based largely on 
the work of Rasmussen, Reason, and others, to better understand errors and 
the circumstances in which people commit errors. Their taxonomy suggests 
that error results from one or more of the following factors, operating alone 
or together, the person’s “information-processing system” or cognitive pro- 
cesses; environmental effects; pressures on and biases of the individual; and 
the individual’s mental, emotional, and attentional states. This taxonomy 
describes errors in terms of four levels; “phenomenological” or observable 
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manifestations of error, cognitive processes, goal-directed behaviors, and 
external factors, such as environmental distractions or equipment design 
factors. 

Shappell and Wiegmann (1997, 2001) propose a taxonomy to apply to the 
investigation of human error in aircraft accidents, a model that has since 
been embraced and applied by such U.S. agencies as the U.S. Coast Guard, 
in the investigation of marine accidents. Expanding on Reason's work, their 
taxonomy differentiates among operations that are influenced by unsafe 
supervision, unsafe conditions, and unsafe acts. Unsafe acts include various 
error categories, while unsafe conditions include both behavioral and physi- 
ological states and conditions. Unsafe supervision, which distinguishes 
between unsafe supervisory actions that are unforeseen and those that are 
foreseen, incorporates elements that Reason would likely term latent errors 
or latent conditions. 

Sutcliffe and Rugg (1998) propose a taxonomy based on Hollnagel’s (1993), 
that distinguishes between error phenotypes (the manifestation of errors) 
and genotypes (their underlying causes). They group the operational descrip- 
tions of errors into six categories and divide causal factors into three groups: 
cognitive, social and company-related, and equipment or tool design. 

O’Hare (2000) proposed a taxonomy, referred to as the “Wheel of 
Misfortune,” to serve as a link between researchers in human error and acci- 
dent investigators seeking to apply research findings to incidents or acci- 
dents. As with Reason, he delineates company-related defenses that could 
allow operator error to affect system operations unchecked. 





Incidents, Accidents, and Investigations 
Incidents and Accidents 


Loimer and Guarnieri (1996), in a review of accident history, described how 
the meaning of term has changed over the years. Aristotle, for example, used 
accidents to refer to nonessential or extrinsic characteristics of people and 
things. Thus, someone could have accidental qualities, for example, one leg, 
and still retain human characteristics. About the fourteenth century, the 
English began to use a more modern understanding of the term, closer to 
that of contemporary times, that is, “to happen by chance; a misfortune; an 
event that happens without foresight or expectation" (p. 102), a usage initially 
found in Chaucer in 1374. As the industrial revolution developed in the late 
eighteenth century, injuries of workers in the textile, railroad, and mining 
industries began to emerge. These were new types of accidents that occurred 
among workers who were operating what were then complex systems, but of 
course system operations required considerably more muscular effort than 
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is true today, with little design and training; consideration directed to worker 
safety. Loimer and Guarnieri noted that accident attribution began to change 
around that time as well, from being considered the result of divine influ- 
ence that had been common in the middle ages to that of worker causation, 
for example, carelessness, of the industrial revolution. 

Coury et al. (2010) wrote that World War II brought about considerable com- 
plexity in systems such as aircraft used in the war effort. System complexity 
was also influenced by the rapid development of and the need to quickly uti- 
lize these systems, which called for hastily training people to operate them, 
factors that contributed to high rates of training accidents. In attempting to 
understand the reasons for the accident rates, researchers focused on opera- 
tor error from the perspective of factors related to the design of the system 
controls and displays, rather than on the operator himself or herself, a focus 
that led to research to better understand how machine design can lead to 
error. 


Process Accidents 


Today, researchers devote considerable attention to examining on the job 
injuries, especially in such industries as petrochemical processing and min- 
ing (e.g., Flin, Mearns, O'Connor, and Bryden, 2000). But the nature of acci- 
dent causation is typically different in worker injury accidents than it is in 
process accidents. In the former, accident causation is largely considered the 
result of flaws in control design, training, or worker attention. In the latter, 
the type that is the focus of this book, causation is generally attributed to 
flaws in the system itself, which can include design, training, and worker 
attention but typically involves elements of the entire system. Certainly, the 
consequences of the two are different as well. Occupational accident con- 
sequences primarily affect system operators while process accidents may 
affect the workers or operators, but as often affect those uninvolved in sys- 
tem operations, such as passengers in transportation accidents, or residents 
near a nuclear generating station that sustained a radiation leak. 

Senders and Moray (1991), focusing on process accidents, term an accident 
“a manifestation of the consequence of an expression of an error” (p. 104). 
Others suggest that accidents are events that are accompanied by injury to 
persons or damage to property. In this way, even minor injuries can change 
the categorization of an incident, typically involving an occurrence of more 
minor consequences, to that of an accident, an occurrence with often major 
or severe consequences. Those consequences can be injuries to persons, 
damage to property, and or pollution of the air, water, or land environment. 

Perrow (1999) distinguished between accidents and incidents largely by 
the extent of the damage to property and injuries to persons. He consid- 
ers incidents to be events that damage parts of the system, and accidents 
events that damage subsystems or the system as a whole, resulting in the 
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immediate shutdown of the system. Although a system accident may start 
with a component failure, it is primarily distinguished by the occurrence 
of multiple failures interacting in unanticipated ways. Catastrophic system 
accidents may bring injury or death to bystanders uninvolved with the sys- 
tem, or even to those not yet born. For example, accidents in nuclear generat- 
ing stations can lead to birth defects and fertility difficulties among those 
exposed to radiation released in the accident. 


Legal Definitions 


Whether an event is classified as an incident or an accident can have con- 
siderable influence on data analysis, research, as well as on civil or criminal 
proceedings. Therefore, much attention has been devoted to the classifica- 
tion of accidents. Loimer and Guarnieri (1996) describe the historic tradition, 
dating to the middle ages, of accident causation being attributed to acts of 
god as compared to acts of people. Today, they note, any accident that is 
caused, directly or indirectly, by natural causes “without human interven- 
tion” is considered to be “an act of god.” In this respect, the March 11, 2011, 
accident at the Fukushima Daiichi nuclear power plant, which occurred in 
the aftermath of a magnitude 9 earthquake and subsequent tsunami, may be 
considered an act of god, despite the fact that the direct cause of the nuclear 
accident was the flooding of the diesel generators that provided electric 
power for emergency water cooling to the nuclear core. Water from the tsu- 
nami entered and contaminated the generators, which had been placed at 
ground level, thereby making them susceptible to flooding given the reac- 
tor’s proximity to the sea. For our purposes, even though the flooding was 
a naturally caused event, the placement of the generators adjacent to the 
sea was not, and thus investigators would still want to examine the system 
shortcomings that allowed the tsunami to result in a nuclear accident. 

In general, the contemporary classification of events into accidents ignores 
the natural- versus person-caused aspect to focus on the severity of the con- 
sequences. Consequently, specific definitions in both international law and 
in the laws of individual nations define accidents. For example, ICAO (1970) 
defines an aircraft accident as, 


An occurrence associated with the operation of an aircraft which takes 
place between the time any person boards the aircraft with the intention 
of flight until such time as all such persons have disembarked, in which: 
a person is fatally injured...or 

the aircraft sustains damage or structural failure...or 

the aircraft is missing or is completely inaccessible. (p. 1) 


ICAO also precisely defines injury and death associated with an accident. 
Injuries include broken bones other than fingers, toes, or noses, or any of the 
following: hospitalization for at least 48 hours within 7 days of the event, 
severe lacerations, internal organ damage, second- or third-degree burns 
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over 5% or more of the body, or exposure to infectious substances or injuri- 
ous radiation. A fatal injury is defined as a death from accident-related inju- 
ries that occurred within 30 days of the accident. An incident is an event that 
is less serious than an accident. 

Other government or international agencies use similar definitions, albeit 
specific to the particular domain. For example, the U.S. Coast Guard defines 
a marine accident as, 


Any casualty or accident involving any vessel other than public vessels if 
such casualty or accident occurs upon the navigable waters of the United 
States, its territories or possessions or any casualty or accident wherever 
such casualty or accident may occur involving any United States vessel 
which is not a public vessel...[including] any accidental grounding, or 
any occurrence involving a vessel which results in damage by or to the 
vessel, its apparel, gear, or cargo, or injury or loss of life of any person; 
and includes among other things, collisions, strandings, groundings, 
founderings, heavy weather damage, fires, explosions, failure of gear 
and equipment and any other damage which might affect or impair the 
seaworthiness of the vessel...[and] occurrences of loss of life or injury to 
any person while diving from a vessel and using underwater breathing 
apparatus. (46 Code of Federal Regulations 4.03-1 (a) and (b)) 


Under U.S. law, 46 U.S. Code § 6101, a major marine accident, referred to as 
a “major marine casualty,” is defined as: 


...a casualty involving a vessel, other than a public vessel, that 
results in— 


1. The loss of 6 or more lives. 

2. The loss of a mechanically propelled vessel of 100 or more gross 
tons. 

3. Property damage initially estimated at $500,000 or more. 

4. Or serious threat, as determined by the Commandant of the 
Coast Guard with concurrence by the Chairman of the National 
Transportation Safety Board, to life, property, or the environ- 
ment by hazardous materials. 


To avoid confusion among the various definitions, both incidents and 
accidents in complex systems will be defined as: unexpected events that cause 
substantial property or environmental damage and/or serious injuries to people. 
Accidents lead to consequences that are more severe than those of incidents. 


Investigations 


Coury et al. (2010) reviewed the history of accident investigations in complex 
systems, focusing on transportation accident investigations, and noted how 
the evolution of accident investigation matched the corresponding evolution 
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in technology. As technology became more reliable, investigations focused 
less on hardware and more on the role of those who operate the systems. 
Although companies often investigated the accidents of systems they owned 
and operated, governments also played a role in the investigations, often 
initially in the role of coroners’ inquests. Eventually, investigations went 
beyond identifying the accident cause as operator error, or pilot error in 
the case of aviation, to focus on the nature of the interaction between the 
operator and the system being operated. Coury et al. (2010) note that with 
the advent of World War II, human factors emerged as a major element of 
accident invstigations. “No longer was it acceptable,” they note, “to merely 
identify the type of pilot error; now the design of the system and its contri- 
bution to the error must also be considered” (p. 16). Further, as investigators 
gained a more sophisticated understanding of error in accident causation, in 
aviation accident investigations, 


Pilot and operator error were no longer simply categories within causal 
taxonomies but instead reflected a more complex interaction between 
people and machines that could be empirically studied and even 
“designed out” of human machine systems. As a result, human factors 
and human performance assumed a larger role in accident investigation, 
in which safety issues were related to potential incompatibilities with 
human information processing, and cognition and influenced the way 
accident investigators thought about pilot actions. (p. 16) 


Le Coze (2013), in a review of major models of investigations, describes 
two “waves” of highly visible major accidents that occurred in the past 20-30 
years that have impacted our view of accidents. The recent accidents, which 
involved a variety of complex systems, 


...all come under the same intense national and often also international 
interest and scrutiny by the media, justice systems, civil society, states, 
financial markets, industry and professions. They have a strong sym- 
bolic component, where each time, and probably at Fukushima more 
than elsewhere, a belief about the safety of these systems that had previ- 
ously been taken for granted has seriously been undermined. (p. 201) 


The accidents to which he referred include, in the first wave, the 1986 
explosion of the space shuttle Challenger, the ground collision at Tenerife of 
two Boeing 747s, and the grounding of the tanker Exxon Valdez. The acci- 
dents in the second wave include the grounding of the cruise vessel Costa 
Concordia, and the meltdown at the Fukushima Daiichi nuclear power plant 
that followed the earthquake and tsunami. 

Although today accident investigations are conducted to identify the cause 
or causes of accidents and thereby develop ways of mitigating future oppor- 
tunities for error and malfunctions, investigations may fulfill multiple mis- 
sions as well. Senders and Moray (1991) acknowledge that investigations can 
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be conducted for a variety of purposes. “What is deemed to be the cause of 
an accident or error,” they write, “depends on the purpose of the inquiry. 
There is no absolute cause” (emphasis in original, p. 106). For example, law 
enforcement personnel conduct criminal investigations to identify perpe- 
trators of crimes and to collect sufficient evidence to prosecute and convict 
them. Governments investigate accidents to protect the public by ensuring 
that the necessary steps are taken to prevent similar occurrences, mandating 
necessary changes to the system or changing the nature of its oversight of 
the system. Kahan (1999) notes that governments have become increasingly 
involved in investigating transportation accidents. Whereas governments 
initially investigated accidents on an individual basis and assigned investi- 
gators to the investigations as they occurred, many governments have estab- 
lished agencies with full-time investigative staffs for the exclusive purpose 
of investigating accidents. 

Rasmussen, Pejtersen, and Goodstein (1994) contend that investigators 
examine system events according to a variety of viewpoints. These include 
a common sense one, and those of the scientist, reliability analyst, therapist, 
attorney, and designer, respectively. Each influences what Rasmussen et al. 
(1994) refer to as an investigation's “stopping point,” that is, the point at which 
the investigator believes that the objectives of the investigation have been met. 

For example, an investigator with a common sense perspective stops the 
investigation when satisfied that the explanation of the event is reasonable 
and familiar. The scientist concludes the investigation when the mechanisms 
linking the error antecedent to the operator who committed the error are 
known, and the attorney concludes the investigation when the one respon- 
sible for the event, usually someone directly involved in the operation who 
can be punished for his or her actions or decisions, is identified. The objec- 
tive advocated in this book is based on the suggestions of Rasmussen et al. 
(1994). Investigators should conduct investigations to learn what caused an 
incident or accident by establishing a link between antecedent and error, so 
that changes can be implemented to prevent future occurrences. 

Dekker (2015) identified four purposes of accident investigations, episte- 
mological, that is, establishing what happened; preventive, identifying path- 
ways to avoidance; moral, tracing the transgressions that were committed 
and reinforcing moral and regulatory boundaries; and existential, finding 
an explanation for the suffering that occurred. These purposes affect the 
conduct of accident investigations. For example, the existential and moral 
needs Dekker identified, and the public policy implications Le Coze (2013) 
described, are addressed by the direct role of governments in investigations. 
Relying on government rather than industry to conduct such investigations, 
for example, satisfies the public need for answers to what happened, and the 
need for reassurance that action will be taken to address the shortcomings 
that led to the accident. Stoop and Dekker (2012), focusing on aviation acci- 
dent investigations, also note the evolution of investigations as technology 
has advanced, to where today we accept failure as “normal,” where resilient 
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systems can allocate scarce safety resources as needed in response to differ- 
ent system states. 

Today, it can be said that investigations, particularly those of major acci- 
dents, serve multiple functions. These serve not just to determine how the 
accident developed and was caused, so that changes in the system can be 
implemented to prevent similar accidents in the future, but for needs that 
transcend those of the accident itself. As Le Coze (2008) writes, 


Accident investigations are not research works with the aim of theoris- 
ing. They are investigative projects, set up in a specific political context 
following a disaster, for understanding its circumstances and for mak- 
ing recommendations. They also serve a societal need for transparency. 
These are big projects carried out through a short period of time, often 
within months. The number of staff is important. This staff includes 
people collecting the data, advisors and consultants from university and 
industry for various aspects ranging from technical to organisational 
and human factors issues, administrative people, etc. (p. 140) 


Accident investigations, where investigators identify the factors that led 
to an accident, analyze how those factors played a role in the circumstances 
in which the accident occurred, and ultimately suggest ways to prevent 
their recurrence, call for data collection and analysis skills. Unlike empiri- 
cal research, which is overseen through peer review, theory testing, and/or 
experimental replication, major accident investigations are typically subject 
to governmental or corporate review. In addition, investigations face time 
pressures that can be considerable. Unless the investigations can be con- 
ducted quickly, the findings of the investigation could have little significance 
in terms of risk mitigation and public need. 

Further, analytical rules of accident investigations tend to be legalistic, 
using logical consistency and the preponderance of evidence. Based on the 
facts gathered, investigators develop a logical explanation of the events that 
led to an accident. This generally results in identifying errors on the part of 
individual operators or operator teams (including maintenance personnel), 
failures of some mechanical component or system, a failure that may have 
been the result of an operator error, and/or errors in actions, inactions and/ 
or shortcomings in decisions of organizational managers. Although some 
investigative agencies shy away from identifying operator errors, the practice 
is still commonplace among such investigative agencies as the United States 
National Transportation Safety Board, the British Air Accidents Investigation 
Branch and the Marine Accidents Investigation Branch, and the French Bureau 
d'Enquétes et d'Analyses pour la sécurité de l'aviation civile, when investiga- 
tors believe that this is warranted. These aspects of investigations affect the 
way in which investigations are conducted, by emphasizing the investigators’ 
ability to complete the investigation in a timely manner (ie., “getting the job 
done"), while simultaneously following rigorous rules of logic. 
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From Antecedent to Error to Accident 
Assumptions 


Several assumptions about operator error in complex systems form the foun- 
dation of the investigative approach of this book. These are 


* The more complex the task, the greater the likelihood that an error 
will be committed. 


* The more people involved in performing a task, the greater the like- 
lihood that an error will be committed. 


* People behave rationally and operate systems in a way to avoid 
accidents. 


* Errors cannot be eliminated, but opportunities for error can be 
reduced. 


Although the first two assumptions may seem rather obvious, some 
assume the contrary, that by adding steps and operators to a task the 
chances of error decrease. In fact, with certain exceptions, the opposite is 
true. As a task becomes more complex and more people are needed to per- 
form it, opportunities for error increase. In addition, operators are ratio- 
nal in that they want to avoid accidents and operate systems accordingly. 
Those who mean to cause accidents in effect intend criminal acts, which 
call for a different investigative approach than that used in this book. It 
should be noted, however, that on occasion criminal acts have been initially 
investigated as accidents, until evidence of operator planning to make the 
event appear to be an accident emerged (e.g., National Transportation 
Safety Board, 2002). 

Systems that people design, manage, and operate, are not immune to the 
effects of error. Because people are not perfect, designers and managers can- 
not design and oversee a perfect system and operators cannot ensure error- 
free performance. Operators of any system, irrespective of its complexity, 
purpose, or application, commit errors. As Gilbert, Amalberti, Laroche, and 
Paries (2007) observed, 


Observations of operators' practices show that they regularly make 
errors, which are therefore not abnormalities or exceptions. The errors 
are accepted, remedied or ignored. Errors are a price to pay, a necessity 
for adjustment, mere symptoms of good cognitive functioning. Errors 
(and all failures) can neither be reduced to departures from the rules, 
nor considered as abnormalities or exceptions. They are an integral part 
of habitual, normal functioning, irrespective of the level on which they 
are situated. (p. 968) 
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The task of investigators, therefore, is to determine the cause of errors so 
that modifications to the system can be proposed, so that the circumstances 
that led to the errors are prevented from recurring. 


General Model of Human Error Investigation 


Researchers have proposed different accident causation and investiga- 
tion models, to explain how error affects operator performance in investi- 
gations. Some, like Leveson’s (2004) systems-theoretic accident model and 
processes (STAMP) model, seek to integrate accident causation analysis with 
hazard analysis and accident prevention strategies. Others, like Shappell 
and Wiegmann’s human factors analysis and classification system (HFACS) 
model (1997, 2001), which is directly based on Reason’s model of error causa- 
tion (1990, 1997), have been widely used to analyze the role of human factors 
in accident causation (e.g., Li and Harris, 2005; Schréder-Hinrichs, Baldauf, 
and Ghirxi, 2011). 

However, models, largely because they are directly based on theory, may 
be difficult to apply in actual investigations. Accidents are unique events and 
investigators must be prepared to identify data to be collected and analyzed 
according to the needs of the investigation, rather than of particular theories. 
As Reason et al. (2006) note, 


Accidents come in many sizes, shapes and forms. It is therefore naive 
to hope that one model or one type of explanation will be universally 
applicable. Some accidents are really simple, and therefore only need 
simple explanations and simple models. Some accidents are complex, 
and need comparable models and methods to be analysed and pre- 
vented. (p. 21) 


Neville Moray (1994, 2000), a British human factors researcher, contends that 
error in complex systems results from elements that form the systems and to 
investigate system errors, one must examine the pertinent elements. He out- 
lines these features with concentric squares that show the equipment as a 
core component of the system (Figure 3.2). These elements shape the system: 


* Equipment 

* Individual operator 

* Operator team 

* Company and management 

* Regulator 

* Society and cultural factors (Figure 2.2) 


Each system component affects the quality of the system operation, and 
can create opportunities for operator error. For example, the information 
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FIGURE 2.2 

Moray's model of error. (From Moray, N. 1994. Human error in medicine (pp. 67-91). Hillsdale, 
NJ: Erlbaum; Moray, N. 2000. Ergonomics, 43, 858-868. Copyright Taylor & Francis. Reprinted 
with permission.) 


operators obtain about the system affects their perception of the system state. 
Displayed information that is difficult to interpret can increase the likeli- 
hood of error. Each of these elements can lead to error in itself, or can interact 
with the others to create opportunities for error. 

To be useful for those investigating accidents, models must be practical 
and if not investigators will have difficulty applying them to investigations. 
Models should also be simple by avoiding complexity in explaining error 
or accident causation. For optimum benefit, models should also be practi- 
cal, while still adhering to research findings on error causation. This text 
will eschew models in favor of a method that, based in the theories of both 
Moray (1994, 2000) and Reason (1990, 1997), is designed to facilitate the task 
of data identification, collection, and analysis for those investigating the role 
of human error in accident and incident investigations. 
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Antecedents 


Because errors are unintended, one assumes that operators want to oper- 
ate systems correctly. Using Moray’s (1994, 2000) model, with that of Reason 
(1990, 1997), their errors are considered to reflect system influences on their 
performance. That is, the operators wanted to perform well but did not 
because of shortcomings within the system. 

I refer to these characteristics as precursors or antecedents to error. As 
Reason argues, antecedents may be hidden within systems, such as in equip- 
ment design, procedures, and training, where they remain unrecognized but 
can still degrade system operators’ performance. The mechanisms by which 
each antecedent or precursor exerts its influence varies with the context and 
nature of both the system element and the antecedent itself. For example, an 
antecedent may distract an operator during a critical task, hinder his or her 
ability to obtain critical information, or limit his or her ability to recall or 
apply the proper procedure. The focus of the accident investigator therefore 
should be to identify those shortcomings within the system that led to the 
accident. 

Investigators identify the presence of an antecedent in two ways, by 
identifying an action, situation, or factor that influenced the operator's per- 
formance during the event, and more importantly, by obtaining evidence 
demonstrating that the operator’s performance was affected by the anteced- 
ent. The evidence, which can take many forms, will be discussed in subse- 
quent chapters. 


Antecedents and Errors 


Antecedents in complex systems contribute to errors through unrecognized 
or unacted upon shortcomings in the system. While complex systems are 
composed of a multitude of components, the elements of the system in this 
book are general, derived from the antecedents identified in both Moray 
(1994, 2000) and Reason’s (1990) models. They can be considered latent errors 
or latent conditions within the system as well as system shortcomings, inad- 
equacies, in sum, any other system action or decision that adversely influ- 
enced an operator’s performance (Figure 2.3). 

The errors that led to accidents and incidents, whether committed by 
operators or system managers, are either action errors, that is, someone 
did something wrong, or decision errors, that is, someone made a decision 
that proved to be erroneous. Further, because in accident causation failure 
to take an action or make a decision may be as critical to the cause of the 
accident as taking the wrong action or making a decision that proved to 
be erroneous, errors of omission should be considered as well as errors of 
commission. 

The logic used in this process will be discussed in more detail in Chapter 3, 
Analyzing the Data. Keep in mind though, that the steps to be conducted in 
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FIGURE 2.3 
Error and accident causation in complex systems. 


identifying both antecedents and errors, and relating them to the accident 
or incident, are ongoing through the investigation. That is, when identifying 
errors and searching for their antecedents, investigators should always keep 
in mind the role antecedents may play in the critical error or errors that led 
to the event under investigation. 





Summary 


Complex systems are those combinations of people, materials, tools, 
machines, software, facilities, and procedures designed to work together for 
a common purpose. Perrow argues that the interactive complexity and "tight 
coupling" or close interrelationships among complex system elements create 
conditions that make accidents and incidents “normal.” When component 
malfunctions occur, the combination of interactive complexity and tight cou- 
pling within the system can create system states that neither operators nor 
designers had anticipated. 

Error is defined as an action or decision that results in one or more unin- 
tended negative outcomes. Perrow's work has influenced theories of error, 
and has changed the way a system's influence on operator performance is 
viewed. Where researchers had seen errors as primarily reflecting on the 
person committing them, contemporary views of error see it originating 
within the operating system. Reason likens these elements to pathogens 
residing within the body. As pathogens can cause illness when certain con- 
ditions are met, system-related deficiencies (latent errors or latent condi- 
tions) cause the normal defenses to fail and lead to an operator error, which 
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causes an event. Moray delineates system elements that can lead to error in 
complex systems. 

Error investigations can have many objectives and purposes, depending 
on the investigator’s perspective. The objective of an investigation should be 
to mitigate future opportunities for error by identifying the critical errors 
and their antecedents, and eliminating them or reducing their influence in 
the system. The model of error that is proposed in the book describes six 
types of antecedents, each of which, alone or in combination, can adversely 
affect operator performance and lead to an error. 
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Analyzing the Data 











As is so often the case when we begin to learn the complexities of a situ- 
ation, some of the issues that had seemed very clear at the outset had 
become more confused. Only much later would we fully understand 
the extent to which oversimplification obfuscates and complexity brings 





understanding. 
Vaughan, 1996 
The Challenger Launch Decision: Risking Technology, 
Culture, and Deviance at NASA 
E 
Introduction 


Most of us routinely make judgments from available data, perhaps without 
recognizing that we have done so. We examine the behavior of our friends, 
acquaintances, and political leaders and infer motives from their behavior to 
explain them. This process—examining an action and explaining it—is the 
foundation of the human error investigator's work. 

Differences between this type of informal analysis and the more formal 
one used in error investigations result less from differences in the process 
than in the application. Unlike the informal process applied to everyday 
situations, investigative analysis is applied systematically and methodically. 
This chapter examines the principles of investigative analysis, the process in 
which error investigators identify relationships between operator errors and 
their antecedents. 





Investigative Methodology 


Accident investigation methodology and the scientific method have simi- 
lar objectives, to explain observed phenomena or events by using formal 
methods of data collection and analysis. The objectives of scientific research 
correspond to those of accident investigations, “[the] systematic, controlled, 
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empirical, and critical investigation of hypothetical propositions about the 
presumed relations among natural phenomena” (Kerlinger, 1973, p. 11). 
Although control groups are not used in accident investigations and the pro- 
cess is not empirical, accident investigators apply a systematic and critical 
methodology to study the relationships between antecedents and errors, and 
the relationships among those errors, to determine the extent of the relation- 
ships, if any, between those errors and the incidents and accidents that the 
errors may have caused. 


Ex Post Facto Designs 


Investigators collect and analyze data after the fact, that is, after an acci- 
dent has occurred, using a method that is similar to “ex post facto” research 
designs. Here, investigators work backward after the event has occurred and 
the data have been collected, to identify and explain the nature of the vari- 
ables that led to the event. Ex post facto analytical techniques allow inves- 
tigators to effectively explain the nature of the relationships underlying 
the data and apply them well beyond the immediate circumstances of the 
event under investigation. Well-conducted investigation analyses fall within 
Vicente’s (1997) observation that, “science...encompass(es) naturalistic obser- 
vation, qualitative description and categorization, inductive leaps of faith, 
and axioms that can never be empirically tested” (p. 325). 

However, researchers have recognized that this method, although provid- 
ing critical insights into event causation, can lead to analytical inaccuracy. 
Because data are gathered after the fact, researchers and investigators can 
select from and apply a favored explanation to account for the obtained 
results, rather than be compelled to accept the explanation that the data 
offer from experimental design techniques developed before the fact (e.g., 
Kerlinger, 1973). Dekker (2002, p. 374) and others refer to this as hindsight 
bias, a tendency in accident investigations to lead investigators to, as he 
writes, make “tangled histories” of what operators were dealing with at the 
time of an accident “by cherry-picking and re-grouping evidence” to fit their 
view of what transpired in the accident. However, knowledge of an opera- 
tor’s error and the accident that occurred as a result need not necessarily lead 
to highsight bias. In fact, investigators as a matter of course recognize that 
their job calls on them to explain errors from the perspective of the person 
who committed them, because doing so allows a proper analysis to be con- 
ducted of the system flaw that led to the error. 

Nonetheless, error investigators compensate for this potential limitation 
because they typically obtain data on many measures, data that had been 
continuously collected throughout the event, unlike researchers who gener- 
ally collect data on only a few parameters, often only at selected intervals, 
and under highly controlled conditions. Further, investigators examine real 
world behavior under conditions that could not reasonably be examined in 
controlled settings. Thus, by collecting considerable data about an event, 
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subjecting the data to objective and systematic analysis, and by being sen- 
sitive to the possibility of hindsight bias, investigators can avoid allowing 
hindsight bias to affect their analyses. 


Imprecision 


The logic of error investigations assumes a direct relationship between one 
or more system deficiencies or shortcomings, and the critical error or errors 
that led to an accident. The previous chapter described the basic model that 
this text follows, that is, antecedent leading to error, which then leads to an 
accident; subsequent chapters will describe the particular antecedents that 
investigators need to examine, and their potential influence on operator 
performance. 

Although the investigative process is systematic, it is still affected by 
the skills and experience of the particular investigator. For this and other 
reasons, some have shied away from definitive identifications of accident 
“causes.” As noted in Chapter 2, “there is no absolute cause” of an accident 
because imprecision is an inherent part of error investigations. Absolute cer- 
tainty in establishing the errors leading to an event is an impossibility. 

Klein, Rasmussen, Lin, Hoffman, and Cast (2014) referred to the expla- 
nation of behavior as “indeterminate causation,” which, as they write, “is 
involved in the anticipation or explanation of human belief and activity” 
(p. 1381). Moreover, they note then when dealing with explanations of events 
involving human behavior, for example, why a sports team lost, 


..no amount of analysis can establish the “actual” cause or single 
cause or “root” cause. There are no single or uniquely correct answers 
to such questions, and no amount of research would uncover the one 
“real” cause or the “objective” cause, because there is no such thing. 
(p. 1381) 


As a result, some investigative agencies use the term “probable cause” of 
an accident rather than the more succinct and absolute “cause,” acknowledg- 
ing that, despite their best investigative and analytical efforts, the influence 
of an unidentified variable remains a possibility. Some agencies do not use 
the term “cause” at all in their investigations but list findings instead, as does 
the Australian Transport Safety Bureau (2007). Some have criticized the use 
of any type of cause in an investigation. Miller (2000), for example, suggests 
that this, 


Relates back to a subject pursued for the past quarter century or so— 
that detestable preoccupation most people seem to have with “cause.” If 
investigative processes and classifications of accident findings continue 
to be hung up on “cause” instead of pursuing the implementation phase 
[of remediation strategies and techniques] further, we are going to be 
static at best in prevention efforts. (p. 16) 
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Yet, differences in the results of incident and accident investigations 
between organizations that determine a cause and those that do not sug- 
gests little difference between them. Whether an organization determines 
a probable cause or not appears to make little difference to the quality of 
the investigation or its proposed recommendations. Irrespective of a require- 
ment to develop a cause to an event, the key focus for investigators should 
be on conducting a thorough and systematic investigation in order to reduce 
future opportunities for error. Doing so will result in effective investigations, 
regardless of the nature of the “cause” or “findings” that are determined. As 
Klein et al. (2014) note, “regardless of which causes are invoked, an explana- 
tion has to adopt a format or argument structure for characterizing these 
causes” (p. 1381). 


An Illustration 


A hypothetical accident illustrates the process. Assume that a train failed to 
stop at a stop signal (also referred to as an “aspect”) and struck another train 
that had been standing on the same track. The locomotive engineer had an 
unobstructed view of the signal. 

The engineer claimed that he observed a stop signal and applied the brakes, 
but the brakes failed. If he is correct, investigators will have to identify a 
mechanical malfunction as the cause of the accident, otherwise they would 
unfairly fault an operator who performed well, and worse from a safety con- 
sideration, fail to address hazards that led to the accident in the first place. 
However, before they could accept the engineer’s explanation as the most 
likely cause of the accident, investigators would have to test and accept the 
viability of several possible conclusions that are necessary to accept a failed 
brakes explanation. These are 


1. The brakes were defective at the time the engineer claims to have 
applied them 


2. Brakes with this defect would be unable to stop a comparable train 
traveling at the same speed, in the same distance, on the same track 
section 


3. Other possible malfunctions that could also have failed to stop a 
comparable train traveling at the same speed, in the same distance, 
on the same track section, were not identified 


Thus, investigators are faced with only two possible alternatives to the 
cause of the accident, assuming that signals, track, and other train systems 
were not involved. Either the engineer failed to properly apply the brakes, 
or he applied them correctly but a mechanical malfunction prevented the 
brakes from stopping the train. To determine which of these conclusions is 
supported, investigators would need to collect a variety of system data. If the 
data supported these conclusions, they could be reasonably confident that 
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defective brakes caused the accident. If not, other explanations would need 
to be proposed, and the data reexamined and reanalyzed. The data would 
either support or refute the proposed explanations. 


Analysis Objectives 


As discussed in Chapter 2, investigators bring their own perspectives to 
the analysis, depending; upon their employer, their values, and the like. The 
investigation objective that is endorsed in this text is, to identify the errors and 
their antecedents that led to the occurrence being investigated, so that future oppor- 
tunities for error can be reduced or eliminated. Investigators should examine the 
collected data to meet this objective until they are confident that the identi- 
fied relationships conform to criteria that will be discussed shortly. 

During an investigation, it is likely that investigators will collect different 
types of data of varying quality. Before analyzing the data, they evaluate the 
collected data to assess their value in the investigation. Not all data are of 
equal value and some types of data should be given more consideration than 
other types. 





Assessing the Quality of the Data 


Some of the data that investigators collect will pertain to the investigation 
objective while other data may not; some data sources will be complete and 
others not. Including incomplete data and data that do not address the ante- 
cedents of error in the analysis will lead to an analysis that contributes little 
to understanding the origin of the particular errors, or worse, is incorrect. 
Determining the quality of data is critical because the effectiveness of an 
investigation largely depends on the quality of the data that investigators 
collect. “Garbage in-garbage out” applies to the analysis of error in incidents 
and accidents as it does to other types of analysis. Two standards of quality 
are used to assess data value, internal consistency and sequential consistency. 


Internal Consistency 


Anderson and Twining (1991), describing legal analysis, believe that inter- 
nally consistent data should converge into one conclusion. Converging data, 
they argue, even if derived from different sources and collected at differ- 
ent times, support the same conclusion. For example, if an operator's per- 
formance history reveals deficiencies and those deficiencies are similar 
to characteristics of the operator's performance at the time of the occur- 
rence, the data converge. In that instance, one could reasonably conclude 
that the operator's performance during the event was consistent with his 
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performance in previous, similar circumstances and not an aberration. In 
complex systems, internally consistent data converge by depicting different 
aspects of the same event similarly, at the same points in time. If they do not, 
the data will not be internally consistent. 

In the hypothetical railroad accident in which defective brakes are sus- 
pected of having caused the collision, one can assume that at a minimum, 
investigators will collect data regarding 


* The operator's train orders and his or her interpretation of them 
e The operator’s speed, power, and brake application commands 
e The actual train speed, power, and brake settings 

* Pertinent operating rules, procedures, and operating limitations 
* The operator's training and performance record 

* Toxicological analysis of specimens of the operator 

* The operator's sleep/wake history before the accident 

* The operator's medical history and medication use 

* The commanded and displayed signals 

* Lights, flags, or markings at the aft end of the standing train 

* Company oversight of its operations 

* The regulator's history overseeing the railroad 


If the brakes had been defective and investigators determined that the 
defect caused the accident, internally consistent data should reveal the 
effects of the defect among a variety of types of data. All data, except those 
pertaining to the brakes and those independent of the sequence of anteced- 
ents and errors/flaws leading to the event, should be consistent. However, 
if the data showed defects in other components that could have altered the 
sequence of occurrences, or if the brakes were found to have been defect 
free, the data would be inconsistent and the discrepancy would need to be 
resolved. 

Inconsistencies could be caused by deficiencies either in the data or in the 
proposed theory or explanation of the cause of the event. Deficiencies in the 
equipment-related data could result from flaws in the recording devices, 
measuring instruments, or, with eyewitnesses, in their perceptions and 
recall of the event. Inconsistencies in operator-related data could be caused 
by any of several factors that will be discussed shortly. Otherwise, inconsis- 
tent data indicate the need to revise the theory or explanation of the cause of 
the accident, to reexamine the data, or to collect additional data. 

Likely sources of inconsistent data. Inconsistencies among the data, though 
rare, are most often found among eyewitness accounts and operator-related 
information. Substantial differences among eyewitness accounts are infre- 
quent, but, as investigators found in the explosion of the Boeing 747 off the 
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coast of Long Island, occur occasionally (National Transportation Safety 
Board, 2000a). Inconsistencies in eyewitness data largely result from percep- 
tual and memory factors, and from differences in interviewer techniques, 
topics that will be discussed in Chapter 11. 

Several factors may explain differences in operator-related information. 
For one, people interact differently with operators than they do with oth- 
ers, based on their relationships with them. Colleagues, acquaintances, and 
supervisors have different perceptions of the operator than would his or her 
family members, and these perceptions will affect the information they give 
interviewers. In addition, as discussed in Chapter 14, changes that occur 
over time in such parameters as measures of operator performance and 
health may also lead to inconsistent data. 

Investigators can safely discard inconsistent data, if the inconsistency is 
not a result of deficiencies in the way the data were collected and if it can 
be safely attributed to factors related to investigation shortcomings or to the 
event itself. Investigators of the 1999 collapse of logs being prepared for a bon- 
fire at Texas A & M University that resulted in 12 deaths, discarded numer- 
ous eyewitness reports that were not supported by the physical evidence, or 
were otherwise irrelevant (Packer Engineering, 2000; Special Commission, 
2000). As investigators, who used the term Bonfire to refer to the stack of 
logs, describe, 


A large number of interview summaries prepared by Kroll [the orga- 
nization that conducted the interviews] contained information which 
was either not in agreement with the physical evidence, or not directly 
related to the Bonfire collapse. These summaries were not included with 
Packer's [the organization that conducted the physical examination of 
the logs and the] analysis. Of the remaining summaries, those contain- 
ing information from witnesses who were physically on Bonfire at the 
time of the collapse were considered most accurate, while those of wit- 
nesses at Bonfire but not on the actual stacks were also considered highly 
accurate. (Packer Engineering, p. 25) 


Because the physical evidence contradicted many of the eyewitness 
accounts, and because the inconsistencies between the eyewitnesses reports 
and the other data did not result from factors related to the event or inves- 
tigation shortcomings, investigators could confidently discard the inconsis- 
tent eyewitness data without affecting the quality of the subsequent analysis 
and the strength of the findings and conclusions. 


Sequential Consistency 


Investigative data should consistently match the sequence of occurrences 
and the period of time in which they occurred. The sequential relationships 
between antecedents and errors are invariant; antecedents will always pre- 
cede errors and errors will always precede the event. 
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In the railroad accident example used earlier, if a signal commands a stop, 
locomotive event recorders would be expected to show, in order, power 
reduction first and then brake application, corresponding to the order of 
the expected operator actions. The data should also match the passage of 
time corresponding to the occurrence, in the actual period in which the train 
approached the signal and struck the standing train. Regardless of the rate 
at which actions occur and system state changes, the two should correspond. 
Specific operator actions must still occur in certain orders and within spe- 
cific periods of time, after certain events have taken place. Further, specific 
operator actions should precipitate specific equipment responses. 

Sequentially inconsistent data may be the result of inaccurate data record- 
ers, defective measuring devices, or deficiencies within the data. If the incon- 
sistencies cannot be resolved satisfactorily, investigators may need to collect 
additional data, or reexamine the data selection and collection methods to 
resolve the inconsistencies. 





Data Value 


Data vary in their value and contribution to the investigation. Depending on 
the event and the data, investigators may rely on some data to understand 
what happened and why and ignore other data. The greater the reliability, 
accuracy, and objectivity of the data, the greater their value to, and influence 
upon, the analysis. Reliable and objective data from different sources should 
describe the same phenomenon the same way, albeit from different perspec- 
tives, regardless of their sources. 

In general, “hard” data, data obtained directly by the system, contribute 
substantially to the investigation because of their high reliability, objectiv- 
ity, and accuracy. By contrast, the value of “soft data,” such as eyewitness 
accounts and interview data, is less because the data can change as a function 
of the person collecting the data, the time of day the data are obtained, and 
the skill of the interviewer or person collecting the data, among other factors. 


Relevance 


Anderson and Twining (1991), referring to legal analyses, consider a state- 
ment relevant if it tends to make the hypothesis to be proven more likely to 
be supported than would otherwise be the case. Data that can help explain 
conclusions regarding the cause of the event, the critical errors, and the ante- 
cedents to the errors, are analogous to data that can support the hypothesis 
and are considered relevant to the investigation. 

Most investigators routinely gather data that may not necessarily relate 
to their investigations but are needed to rule out potential explanations or 
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factors. If it is determined that an operator did not commit an error, one can 
exclude data from the analysis that pertains to the operator's performance his- 
tory without degrading the quality of the analysis or the investigation, unless 
the data relate to other critical issues. On the other hand, if operator error is 
believed to have led to the incident, almost all data concerning the operator 
would be considered relevant and therefore would be included in the analysis. 

Data relevance can change as more is learned about an event. For example, 
an initial focus on potential training deficiencies makes information perti- 
nent to the development, implementation, and conduct of the training; rel- 
evant to the investigation. If the data suggest that equipment design factors 
rather than training affected operator performance, operator training-related 
data would be less relevant. 


Quantity 


The more data obtained about a particular aspect of the system, the more 
confidence one can have in the value of the data and their contribution to the 
analysis. For example, in some systems, multiple recorders capture a vari- 
ety of operator performance parameters, documenting the operator’s spoken 
words and any related sounds. These provide a considerable amount of data 
that describe, both directly and indirectly, what the operator did before and 
during the event. 

If there are little data available, other measures that can approximate the 
parameters of interest should be sought. If no data directly describe aspects 
of operator performance, investigators may need to learn about operator 
actions from other sources, such as from system recorders. If there are insuf- 
ficient data available to allow inferences about the parameters of interest, 
conclusions regarding the data of interest will have little factual support. 





Identifying the Errors 


After the data have been examined and evaluated, one can begin to propose 
relationships among antecedents, errors, and the causes of the event. 


The Sequence of Occurrences 


To begin developing the critical relationships, first establish the sequence 
of actions and occurrences in the event. The sequence will determine the 
order of actions and decisions, and facilitate the task of identifying the criti- 
cal relationships. 

Establish the sequence of occurrences in the event by working backward 
from the event itself until the errors that led to the event, and the antecedents 


48 Investigating Human Error 


to those errors, are reached—what Rasmussen, Pejtersen, and Goodstein 
(1994) refer to as the “stopping point.” Regardless of the event, whether an 
airplane accident, chemical refinery explosion, or vessel grounding, stop col- 
lecting data and analyzing the data at the point at which the sequence of 
occurrences that led to the incident or accident begins. 

Using the railroad accident discussed earlier, the sequence of occurrences 
begins with the collision. Working backward from the event, occurrences 
earlier in the sequence would likely include the engineer’s brake application 
and power reduction, and progress to company brake maintenance practices, 
going as far back as brake manufacture and locomotive assembly. 

The sequence of occurrences includes major system elements. In this illus- 
tration, these would include the operator, the railroad, the regulator, and the 
brake system. However, a few issues should be ruled out early in the inves- 
tigation. Data pertinent to those issues need to be collected to determine the 
role of each element in the event. 

For example, if it is learned that the locomotive engineer did not apply 
the brakes properly, then operator actions would be a focus of the investiga- 
tion and investigators would need to identify potential antecedents to those 
actions. Other issues to be investigated would likely include the railroad’s 
training and oversight of its operators, and the regulator’s oversight of the rail- 
road. Although each accident is unique with its own set of occurrences, the crit- 
ical facts, in this instance the collision, the record of inspections of the brakes 
and their manufacture, would not be in dispute. A list of an initial sequence 
of occurrences of the hypothetical railroad accident is illustrated below. 


Sequence of Occurrences—Beginning with the Collision 


= 


. The collision 


N 


. Locomotive operator brake application 


ies) 


. Locomotive operator power reduction 


Ha 


. Railroad signal system maintenance and inspection 


al 


. Locomotive operator initial and refresher training 


D 


. Railroad brake system maintenance and inspection 


N 


. Railroad brake system maintenance personnel selection practices 


oo 


. Brake system manufacture and installation 
9. Railroad signal system selection and acquisition 
10. Signal system manufacture 
11. Railroad signal system selection and installation 
12. Railroad signal installer, maintenance, and inspection personnel 
training 
13. Locomotive operator selection 
14. Railroad brake system maintenance personnel selection practices 
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15. Railroad signal system installer, maintenance, and inspection per- 
sonnel selection practices 


16. Regulator oversight of railroad signal system 
17. Regulator oversight of brake system 


Let's assume that in this example, after interviewing critical personnel and 
collecting and examining the data, investigators determine that the operator 
performed satisfactorily. In that case, data relating to operator performance 
history can be safely excluded from the sequence of occurrences and from 
subsequent data analysis. The results of a second iteration of a sequence of 
occurrences, after first discarding occurrences irrelevant to the issues of 
interest can be seen in the list below. 


Events Excluded 


1. Locomotive operator brake application 

2. Locomotive operator power reduction 

3. Locomotive operator initial and refresher training 
4. Locomotive operator selection 


Events Retained 


= 


. The collision 


N 


. Railroad signal system maintenance and inspection 


Qo 


. Railroad brake system maintenance and inspection 


a 


. Railroad brake system maintenance personnel selection 


al 


. Brake system manufacture and installation 


lon 


. Railroad signal system selection and acquisition 


N 


. Signal system manufacture 
. Railroad signal system selection and installation 


NO 00 


. Railroad signal installer, maintenance, and inspection personnel 
training 
10. Railroad brake system maintenance personnel selection practices 


11. Railroad signal system installer, maintenance, and inspection per- 
sonnel selection practices 


12. Regulator oversight of railroad signal system 
13. Regulator oversight of brake system 


The Error or Errors 


After examining the data, assessing their relative value, and establishing the 
sequential order of occurrences, investigators can exclude from the analysis 
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several additional factors that would no longer be considered relevant to the 
accident. For example, if tested and found to have been in acceptable condi- 
tion at the time of the event, factors related to the signal system may now be 
considered irrelevant. 

The next step in the data analysis can now be conducted, identifying the 
errors that led to the event, perhaps the most critical step in the analysis. 
This step is distinct and separate from the formal or legal determination of 
the accident cause. The focus should be on the errors suspected of leading 
to the event. 





Assessing the Relationship of Antecedents to Errors 


After identifying the errors, the antecedents of those errors must be deter- 
mined. The process is largely inferential, based on investigative logic regard- 
ing the relationship between the two. The evidence consists of the nature of 
the error, and information from written documentation, interviews, system 
recorders, equipment, and other sources. 


Inferring a Relationship 


A relationship between antecedent and error must be logical and unambigu- 
ous. Investigators must establish that the antecedent, either by itself or with 
others, influenced the operator’s performance so that he or she committed an 
error. To identify the antecedent, one should ask a counterfactual question, 
would the operator have committed the error if this (and other) antecedent(s) 
had not preceded it? If the answer is no, one could be confident that the 
antecedent led to the error. Counterfactual questions are central to analyzing 
error data in investigations. 

Assume that insufficient operator experience is one of several antecedents 
that affected the performance of an operator, and the operator misinterpreted 
system-related data as a result. A relationship between experience in operat- 
ing a system and the error of misinterpreting data is logical; a more experi- 
enced operator is less likely to commit the same error than a less experienced 
one. This conclusion is supported by research findings and the determina- 
tions of previous accident investigations. This relationship between anteced- 
ent and error is clear and unambiguous, reached only after the necessary 
facts have been obtained and analyzed. 


Statistical Relationship 


The logic used to establish a relationship between antecedents and errors 
is analogous to multiple regression analysis, a statistical technique used to 
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determine the relationship between one or more predictor variables and a 
single variable (e.g., Harris, 1975). Economists, for example, employ multiple 
regression analysis to predict the combined effects of changes in variables 
such as the prime interest rate, unemployment, and government spending, 
or changes in an outcome variable such as inflation rate. 

The stronger the relationship between the predictor or influencing vari- 
ables and the outcome variable, the higher the correlation between the two 
sets of variables. In relationships that have high positive correlations (say 
0.60 or higher since correlations of plus or minus one are the limits of cor- 
relational strength), changes in the predictor variables are associated with 
corresponding changes in the outcome variables. As the value of predictor 
variables increases or decreases, the value of the outcome variable similarly 
increases or decreases. If the correlations are negative, predictor vari- 
able changes in one direction would be associated with outcome variable 
changes in the opposite direction. As the predictor variables increase or 
decrease in value, the outcome variable loses or gains value in the opposite 
direction. 

Multiple regression analyses also describe another facet of these relation- 
ships that can be stated statistically; when the correlation between the two 
sets of variables is high the predictor variables account for much of the total 
variance in changes in the outcome variable. That is, the higher the correla- 
tion between the two, the more that changes in the predictor variables—and 
not some other variable or the effects of chance—are associated with changes 
in the outcome variable. The lower the correlation, the less that changes in 
the outcome variable can be attributed to changes in the predictor variables. 
In that case, changes in the outcome variable will more likely be associated 
with variables that had not been considered in the analysis. 

In investigations of error, the predictor variables correspond to the ante- 
cedents and the outcome variable to the critical error. Investigators assess 
the relationship between one or more antecedents and the operator's error in 
the circumstances that prevailed at the time of the accident. The stronger the 
relationship between the antecedents and errors, the more the antecedents 
would account for “variance” about the errors, and the more the error can be 
attributed to those antecedents, and not to other variables or antecedents not 
yet recognized. 


Relating Antecedents to Errors 


In short, and as mentioned, relationships between antecedents and errors 
need to meet three critical criteria; (1) they should be simple, (2) logical, and 
(3) superior to other potential relationships among the variables. These cri- 
teria are related; if a relationship meets one criterion, it will likely meet the 
others as well. 

The influence of an antecedent variable on the error should be as simple as 
possible. One should be directly related to the other, with as few assumptions 


52 Investigating Human Error 


as possible needed to support it. A simple relationship should also be logi- 
cal, one that makes sense to all concerned. It should require little analytical 
effort to understand the relationship between one and the other. In addi- 
tion, it should be simpler and more logical than other, alternative proposed 
relationships. 


Counterfactual Questions 


To determine with confidence that a proposed error has contributed to the 
cause of the event, ask a counterfactual question; would the accident have 
occurred if this error had not been committed? If the answer is no, the acci- 
dent would not have occurred, one can be confident that the error caused or 
contributed to the cause of the accident. 

Using the train collision illustration, assume that (1) the brake defect 
resulted from a maintenance error and (2) the defect was sufficiently con- 
spicuous that inspectors should have noticed it during routine inspections, 
but they did not. In addition to the errors of those involved in the brake 
maintenance, the investigation would also examine the inspectors’ errors 
and consider them contributory to the accident. In this accident, if neither 
error had been committed, the accident would not have occurred. Both 
errors are needed for the accident to occur, and each can be considered to 
have led to the accident. If the maintenance error has been identified, the list 
of relevant occurrences to be retained can be further narrowed, with a con- 
comitant expansion of the list of those excluded, as illustrated below. This 
list includes the accident itself, the errors that directly led to it, as well as the 
antecedents that may have allowed the errors to occur. 


Events Excluded 


. Locomotive operator brake application 
. Locomotive operator power reduction 
. Railroad signal system maintenance and inspection 


1 
2 
3 
4. Locomotive operator initial and refresher training 
5. Railroad signal system selection and acquisition 
6. Signal system manufacture 

. Railroad signal system selection and installation 
8 


. Railroad signal installer, maintenance, and inspection personnel 
training 
9. Locomotive operator selection 


10. Railroad signal system installer, maintenance, and inspection per- 
sonnel selection 


11. Regulator oversight of railroad signal system 
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Events Retained 


1. The collision 

2. Brake system manufacture and installation 

3. Railroad brake system maintenance and inspection 

4. Railroad brake system maintenance personnel training 
5. Railroad brake system maintenance personnel selection 
6. Regulator oversight of brake system 





Multiple Antecedents 


In complex systems, multiple antecedents often influence operator per- 
formance. Multiple antecedents can affect performance cumulatively, by 
increasing the influence of each to bring a greater total influence on operator 
performance than would otherwise be the case, and they can interact with 
each other to differentially affect performance. Investigators should search 
for the presence of multiple antecedents, even if one antecedent appears to 
adequately explain the error. 


Cumulative Influence 


Multiple antecedents can increase each antecedent’s influence on opera- 
tor performance so that their cumulative total influence is greater than 
would otherwise be true. For example, individual antecedents of fatigue 
can cumulatively influence performance beyond that of individual ante- 
cedents, as investigators found in a 1998 accident involving a commercial 
bus. The bus driver fell asleep while at the controls, and the bus ran off the 
road and struck a parked truck as a result (National Transportation Safety 
Board, 2000b). 

Investigators identified three antecedents of the drivers fatigue. 
Individually, each may have been insufficient to have caused him to fall 
asleep while operating the vehicle, but combined, their effects were substan- 
tial. Toxicological analysis of a specimen from the driver’s body revealed the 
presence an over-the-counter sedating antihistamine that he had consumed 
earlier to treat a sinus condition. He had also worked at night for several con- 
secutive days before the accident, after having maintained a daytime awake/ 
nighttime asleep pattern, a schedule change that had disrupted his sleep 
patterns and caused a sleep deficit. Further, the accident occurred at 4:05 
a.m., a time when he would ordinarily have been in his deepest phase of 
sleep. Those who stay awake at that time are especially prone to the effects 
of fatigue. Combined, the effects of the sedating antihistamine, disruptive 
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schedule, and time of day were sufficiently powerful that the driver was 
unable to stay awake. 


Interacting Antecedents 


Interacting antecedents can differentially affect operator performance. 
That is, two or more antecedents together will affect performance differ- 
ently than the antecedents would have if acting on their own. To illustrate, 
assume that the control rooms of two electrical power generating stations, 
designed 5 years apart, are identical in all respects except that one employs 
“older” analog gauges and the other “newer” digital displays to present sys- 
tem information. The same information is shown in both, and in both gen- 
erating station operators have received identical training and use identical 
procedures. 

The operators of the two generating stations also have different levels of 
experience; one group has an average of 10 years of experience and the other, 
2 years. Thus, four different operator/equipment groups are possible: 


1. Experienced operators with “old” analog displays 

2. Inexperienced operators with “old” analog displays 
3. Experienced operators with “new” digital displays 
4. Inexperienced operators with “new” digital displays 


Further, in a certain nonroutine situation, the displays present information 
that requires the operators to respond. Only one of two responses is pos- 
sible for that situation, either correct or incorrect. With no interaction, differ- 
ences in operator response would be affected either by their experience or by 
the display type, or there would be little or no difference in their responses. 
Inexperienced operators might respond erroneously while experienced ones 
would not, or operators working with the “newer” displays could respond 
correctly though the others not. Alternatively, with no interaction all four 
groups could perform correctly or all could commit errors, in which case the 
effects of either operator experience or display type would lead to perfor- 
mance that is independent of the other. Figures 3.1 through 3.5 illustrate five 
of the possible outcomes. 

An interaction occurs when experience and display type interact to dif- 
ferentially affect operator performance. Operators committing the greatest 
number of errors could be the inexperienced ones who worked with the 
“older” displays. Alternatively, experienced operators working with the 
“newer” technology could commit the greatest number of errors, and the 
inexperienced operators working with analog displays, the fewest. 

The variety of human behavior, the diversity among procedures, train- 
ing, and equipment, and the numerous component interactions within com- 
plex systems are such that the potential number of interacting antecedents 
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Experienced Inexperienced 
Analog Correct Incorrect 
Digital Correct Incorrect 
FIGURE 3.1 
Noninteracting antecedent: Operator experience. 
Experienced Inexperienced 
Analog Correct Correct 
Digital Incorrect Incorrect 
FIGURE 3.2 
Noninteracting antecedents: Display type. 
Experienced Inexperienced 
Analog Incorrect Incorrect 
Digital Correct Correct 
FIGURE 3.3 
Noninteracting antecedents: Display type. 
Experienced Inexperienced 
Analog Incorrect Correct 
Digital Correct Incorrect 
FIGURE 3.4 
Interacting antecedents: Experience and display type. 
Experienced Inexperienced 
Analog Correct Incorrect 
Digital Incorrect Correct 





FIGURE 3.5 


Interacting antecedents: Experience and display type. 


that could affect performance is practically infinite. For example, training 
can interact with procedures or operating cycles so that certain types of 
training, say on-the-job training and classroom lectures, lead to different 
levels of performance, according to the particular procedure and operat- 
ing cycle. Oversight may interact with managerial experience so that cer- 
tain types and levels of oversight lead to superior operator performance. 
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Less experienced operators may perform best with extensive oversight, and 
experienced operators may perform best with little oversight. Some opera- 
tors may perform effectively with certain types of controls, but erroneously 
with others, according to the type of training they receive. Because of the 
possible presence of interacting antecedents, investigators should continue 
as long as reasonably possible to continue searching for error antecedents, 
after identifying one or two that appear to have played a role in the error in 
question. 


Concluding the Search for Antecedents 


Despite thorough evidence gathering and sound analysis, investigators may 
experience some uncertainty regarding the antecedents that were identi- 
fied. “Did I overlook something?” is a question that investigators often ask 
themselves. Statistical and experimental design techniques aid empirical 
researchers to reduce the role of unidentified variables, but even these tech- 
niques cannot exclude the possibility that something not identified influ- 
enced the obtained results. Researchers strive to control the variables they 
could identify, but because unidentified variables may always be present, 
absolute certainty is not possible. Rather, researchers rely on tests of sta- 
tistical probability, in which the influence of randomly acting variables is 
measured and, if sufficiently low, acknowledged but considered sufficiently 
unlikely as to be absent. 

Accident investigators must also acknowledge the possibility that ante- 
cedents that they had not identified contributed to the critical errors. 
Unidentified antecedents are always potential factors in investigations. 
Nevertheless, investigators can be confident that with methodical data gath- 
ering and thorough and objective analysis, they can minimize the possible 
effects of unidentified antecedents. Systematically and logically examining 
the effects of antecedents that are believed most likely to have influenced the 
probable errors, using investigative processes to determine the role of ante- 
cedents in error, and relying on empirical research and previous investiga- 
tion findings to support the role of antecedents in error causation minimizes 
the likelihood that unidentified variables will be missed. 

Sound analytical techniques also enable investigators to recognize when 
they have reached the point at which the search for antecedents should be 
stopped. Earlier in this chapter, the “stopping point,” the point at which 
the search for antecedents should be ended, was discussed. For the pur- 
poses of this text, the stopping point is reached when investigators can no longer 
identify antecedents that can serve as the target of remedial action. Theoretically, 
the search for antecedents is infinite and investigators can never be cer- 
tain that they have identified all possible antecedents. Investigators should 
pursue all issues and seek to identify all potential errors and antecedents. 
However, at some point, the increase in precision needed to understand 
the origin of the errors or mechanical failures is not worth the expending 
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of additional resources. When reaching the point at which logic dictates 
that little further activity will be worthwhile, further investigative activity 
becomes unproductive. 

For example, suppose investigators identify deficient regulator oversight 
as a factor in the defective brakes, used in the previously discussed example. 
They determine that with effective regulator surveillance, deficiencies in the 
railroad’s oversight would have been identified, the deficiencies corrected, 
and the defective brakes likely identified and repaired. However, the regu- 
lator could argue that it performed the best oversight it could with its lim- 
ited resources. It could contend that it does not determine the number of its 
inspectors, rather, that Congress or Parliament makes that determination in 
its legislation. Of course, that is taking the search for antecedents to its ulti- 
mate conclusion. Pursuing the argument to that point is untenable if for no 
other reason than no agent could be identified that could implement effective 
remediation strategies. Before that point the investigation will have passed 
the point of diminishing returns, with little additional benefit gained from 
further activity. 





Recommendations 


After determining the relationships between errors and antecedents, identify 
the recommendations needed to mitigate future opportunities for error, the 
final step in the investigative analytical process. Many investigative agen- 
cies propose recommendations as the vehicle for strategies and techniques to 
correct system deficiencies that they have identified. Others use other means, 
but for the purpose of this text, the term “recommendations” will be used to 
describe proposed remediation strategies. 

Recommendations accomplish a major objective of error investigations, to 
address and mitigate the system deficiencies or antecedents that led to the 
operator errors identified in the investigation, to reduce future opportuni- 
ties for error. Recommendations describe at least two separate, but related 
entities, (1) the system deficiency and its adverse effects on safety and (2) the 
proposed remediation strategy or technique to correct the deficiency and 
improve safety. 

Recommendations begin with an explanation of the deficiency and its 
adverse effects on safety. When referring to error, deficiencies are the ante- 
cedents to errors, but deficiencies can also be mechanical malfunctions, 
design failures, or other system defects. In general, three types of system 
deficiencies are the subject of recommendations; those that (1) led to the 
accident, (2) contributed to the cause of the accident, or, (3) were identi- 
fied as system safety deficiencies, but were not involved in the cause of the 
accident. 
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The example of the rail accident cited earlier, in which inspectors failed to 
detect a flaw in the system, can illustrate how to develop recommendations. 
Suppose that investigators identified these deficiencies regarding inspector 
performance in failing to recognize the defects in the brakes, 


1. Inspector fatigue from abrupt scheduled shift changes 


2. Inappropriate inspector expectancy from having inspected flawless 
components exclusively 


3. Inadequate inspection station illumination 
4. Inadequate inspection procedures 
5. Defective equipment 


Recommendations can be proposed to address each of the safety deficien- 
cies. Because the regulator and the company can correct each deficiency, the 
recommendations can be directed to either one. However, addressing rec- 
ommendations to the regulator would, in effect, direct them to all organiza- 
tions that the regulator oversees. If similar deficiencies are present at other 
companies, the regulator would implement or require corrective action with 
regard to those organizations as well in response to the recommendation. 
For the sake of simplicity, the recommendations used in the illustration will 
be directed to the regulator. 

Investigators can take many directions in proposing recommendations. 
They can suggest specific solutions or leave it to the recipient of the recom- 
mendation to develop its own strategies to address the deficiency. The latter 
method is often preferred since it gives the recipient the latitude to develop 
corrective actions that meet its own needs, so long as investigators are satis- 
fied that the corrective actions will be effective and meet the intent of the 
recommendation. 

To develop a recommendation that addresses the first deficiency or ante- 
cedent, fatigue from an irregular work schedule, investigators can ask the 
regulator to revise its rules governing scheduling practices to prevent abrupt 
changes in shift schedules. Other recommendations, such as requiring com- 
panies to provide adequate rest periods before scheduling operators for 
night work, informing operators of the nature of fatigue and its effects, and 
providing information to both supervisors and operators to help them recog- 
nize operator fatigue, can also be made. 

The second deficiency, expectancies from dealing with flawless compo- 
nents, can be corrected by requiring companies to, randomly and without 
notice, include in the items to be inspected, brake systems with recognizable 
defects, to increase the likelihood of inspectors encountering defects, and 
thus reduce their expectations of flawless parts. This action would have the 
additional benefit of creating a mechanism for both companies and operators 
to identify potential inspection problems. 
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To address the third deficiency, inadequate illumination, investigators can 
recommend that the regulator require companies to install adequate lighting 
in inspection stations. A recommendation to address the fourth deficiency, 
inadequate company inspection procedures, could be corrected by requir- 
ing companies to review existing procedures, identify the inadequacies, and 
develop procedures that address them. The fifth deficiency, defective equip- 
ment, could be rectified by requiring companies to examine their inspec- 
tion equipment and replace or repair those items found to be defective. The 
proposed recommendations address specific antecedents to acknowledged 
errors, by identifying the deficiencies and proposing either general or spe- 
cific corrective actions to the proper recipient. 





Summary 


Analyzing error data in accident investigations is similar to conducting 
empirical research; both apply formal methods of inquiry to explain relation- 
ships within data. In a human error investigation, the relationships under 
study are those between errors that led to an occurrence and the antecedents 
that led to the errors. 

Human error investigators usually collect a substantial amount of data. 
However, only internally and sequentially consistent data should be included 
in an analysis. Data that do not meet these standards may have to be dis- 
carded, additional data obtained, and hypotheses revised to account for the 
inconsistencies. 

The sequence of occurrences of the event is determined by working back- 
ward from the event to identify critical errors, and the antecedents that influ- 
enced the errors. Relationships between antecedents and errors should meet 
standards of simplicity, logic, and superiority to alternative relationships, and 
establish that without one the other would not have occurred. Investigators 
should then answer counterfactual questions to determine the role of the 
operator error or errors in an accident's cause, and the role of antecedents in 
error causation. Investigators should also consider the potential presence of 
multiple antecedents after identifying key error antecedents. Multiple ante- 
cedents can cumulatively increase each other’s combined influence on opera- 
tor performance, or interact to differentially affect performance. 

After identifying the antecedents, investigators should develop recommen- 
dations to address safety-related deficiencies identified in the investigation. 
These will include the identified antecedents as well as safety deficiencies 
that were identified but which may not have been antecedents to the errors 
involved in the cause of the event. The recommendations should identify the 
deficiencies and suggest ways to mitigate them. 
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HELPFUL TECHNIQUES 


e Discard data that do not meet standards of internal consis- 
tency, sequential consistency, reliability, objectivity, and accu- 
racy, and if necessary, collect new data or revise the prevailing 
hypotheses. 


e Determine the relevance of the data to the circumstances of the 
event, the critical errors, and the antecedents to the errors. 


e Establish a sequence of occurrences by working backward, 
beginning with the final phase of the accident sequence 
and progressing to the errors that led to the event and their 
antecedents. 


e Identify errors by asking, “Would the accident occurred if this 
error had not been committed?” 


e Identify antecedents by asking, “Would the operator have 
committed these errors if these antecedents had not preceded 
them?” 

e Establish relationships between antecedents and errors that are 
simple, logical, and superior to other potential relationships. 

* Propose recommendations, after identifying errors and ante- 
cedents, that identify system deficiencies and suggest remedia- 
tion techniques. 
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Equipment 











Designers go astray for several reasons. First, the reward structure of 
the design community tends to put aesthetics first. Design collections 
features prize-winning clocks that are unreadable, alarms that cannot 
easily be set, can openers that mystify. Second, designers are not typical 
users. They become so expert in using the object they have designed that 
they cannot believe that anyone else might have problems; only interac- 
tion and testing with actual users throughout the design process can 
forestall that. Third, designers must please their clients, and the clients 
may not be the users. 


Norman, 1988 
The Design of Everyday Things 





Introduction 


A well-known accident involving a complex system, the March 1979 accident 
at the Three Mile Island nuclear generating plant (Kemeny, 1979), demon- 
strated the extent to which poorly designed equipment can adversely affect 
operator performance. Investigators found that the operators, confused by 
the many alarms and warnings signaling a malfunction, had difficulty inter- 
preting the displayed data to understand the event. 

In World War II, the U.S. Army Air Corps and British Royal Air Force 
each recognized the importance of equipment design on the safety of 
pilots who were in training. Both changed aspects of cockpit features to 
enhance flight safety, based on their studies of pilot-aircraft interactions 
(Meister, 1999; Nickerson, 1999). Researchers have continued to study and 
apply human factors and ergonomics principles to the design of equip- 
ment in both simple and complex systems to improve system safety 
(e.g., Corlett and Clark, 1995; Karwowski and Marras, 1999; Wickens and 
Hollands, 2000). 

Research has shown that, although operators obtain much of the oper- 
ating system information they need from system displays, they use other 
sources as well. Mumaw, Roth, Vicente, and Burns (2000) found that opera- 
tors actively acquire information from other operators, maintenance and 
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operating logs, and from their own observations of operating conditions. 
Today, it is recognized that experienced operators obtain system informa- 
tion from many sources, but still rely extensively on the equipment itself to 
understand the system state. This chapter will examine features of equip- 
ment design to understand their effects on operator performance. 





Visual Information 


Operators acquire and use system-related information to understand the 
current and near-term system states and the associated operating environ- 
ment. Operators can obtain this information through any sensory modal- 
ity. Although most systems present system information visually and aurally, 
some use tactile cues as well, such as the stick shaker in high performance 
aircraft that signals an impending aerodynamic stall. Presenting informa- 
tion through different sensory modalities has unique advantages and disad- 
vantages in terms of their effects on operator performance. 

Visual displays enable information with a high degree of precision to be 
presented. As a result, most system information is presented visually. But 
visually presented information must be displayed properly for operators to 
efficiently obtain critical information, and operators must be looking at the 
displays in order to access the information. 

Visual displays differ in the ease with which operators obtain and inter- 
pret system-related information, depending on different facets of their pre- 
sentations. These features affect the quality of operator interpretation of 
visual information, 


* Number of displays 

* Organization and layout 
* Conspicuity 

* Interpretability 

* Trend portrayal 


The Number of Displays 


Visual information is presented primarily through either analog or digital 
displays. Analog displays are found in older systems, and generally show 
a one-to-one relationship between a component or subsystem and the cor- 
responding display of information. Systems with numerous components 
and subsystems may have hundreds of analog displays, each providing 
critical information about one component or subsystem. For example, the 
illustration of a Soviet era nuclear power plant in Figure 4.1 shows a display 
with dials too numerous for operators to readily monitor. Should one show 
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FIGURE 4.1 
Soviet era nuclear power plant. Note the numerous dials and controls. (Copyright Gary Knight. 
Reprinted with permission.) 


information revealing an unusual or unexpected occurrence, the operator 
would be unlikely to notice the information without additional assistance. 
The operator would have to search the displays to identify and locate the 
needed information before even trying to comprehend the cause of the 
occurrence. During high workload periods, such as during anomalous oper- 
ating conditions, numerous displays could interfere with an operator’s abil- 
ity to quickly locate and understand the critical data in the available time. 


Organization and Layout 


Display organization can influence an operator's ability to access needed sys- 
tem data, especially in a system with numerous displays. Display groupings 
that do not conform to the logic that operators use to understand the system 
state can prolong the time they need to find and understand the needed data. 
The more readily the display organization allows operators access, the fewer 
the opportunities for operator error. 

Rasmussen and Vicente (1989) propose organizing information according 
to what they term “ecological interface design,” by matching the organiza- 
tion of the displays to the operator’s mental model of the system state. This 
will support an operator’s cognitive activities during interactions with the 
systems, and hopefully reduce opportunities for error. 

Poorly organized displays, “cluttered” displays, or displays that do not sep- 
arate critical information from noncritical information will adversely affect 
operator performance. Wickens and Carswell (1995) refer to these adverse 
effects as the “information access cost” of display organization. The greater 
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the cost, the more cognitive effort operators exert and the more time they 
will need to access and interpret critical information. 


Conspicuity 


The greater the contrast between a display feature and that of other displays, 
the more conspicuous the displayed data will be and hence, the lower the 
operator’s information access cost. Conspicuity is influenced by display size, 
contrast, and luminance relative to adjacent displays. The larger a display, 
and the relatively brighter it is compared to others and its surroundings, the 
greater it will stand out against the prevailing background, and the more 
likely the operator will notice it (e.g., Sarter, 2000). 


Interpretability 


The more interpretable the data, the more readily operators can use the infor- 
mation to understand the system state. Consider a gauge that displays an 
automobile's coolant temperature. By itself, the temperature has little mean- 
ing to those who are unaware of the engine's optimum temperature range 
in “normal” operating conditions. But a gauge that displays a picture of an 
engine as a face that smiles, with the smile changing to a frown and the face 
color becoming a deeper red as the temperature increases would be consid- 
erably more interpretable to drivers who may otherwise not understand the 
relevance of the temperature to the engine status. 

Designers have used different methods to increase operators” ability to 
understand visually presented data. Abbott (2000) describes a method of 
presenting aircraft engine information that is considerably more interpreta- 
ble than current displays of the same information, because the presentation 
more closely matches the needs of the operator. Aircraft engine-related data 
displays, and their effects on operator performance, will also be discussed 
in Chapter 10. 

Color can also readily convey information. Parsons, Seminara, and Wogalter 
(1999) found that in numerous countries and cultures, the color red indicates 
hazardous conditions. Similarly, green and yellow or amber signify normal 
and cautionary operating conditions, respectively. Designers have often 
placed colors behind a pointer or gauge on analog displays so that operators 
can quickly recognize the value of the component parameter as the pointer 
approaches the color. Automobile drivers use tachometer colors to determine 
when an engine “red lines” or approaches its maximum safe operating range 
to obtain maximum engine performance when changing gears. 

Digital displays allow substantial flexibility in presenting data. They can 
be designed to present pictures, smiles, or frowns, for example, to convey 
information. Some systems use flow diagrams to display the state of electri- 
cal, pneumatic, and other subsystems, enabling operators to quickly identify 
a flow anomaly and recognize its impact on the system as a whole. 
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Although digital displays offer flexibility in presenting information, the 
relationship of display flexibility to operator performance has not been dem- 
onstrated consistently. Miller and Penningroth (1997) conclude that digital 
displays may not necessarily result in superior operator performance relative 
to analog displays. By contrast, Abbott (2000) believes that properly designed 
digital displays can enhance operators’ ability to interpret data. 


Trend Portrayal 


Because of the dynamism of many complex systems, operators need to quickly 
detect and interpret the direction and rate in which component parameters 
change, in order to understand their effects on system state. Nonetheless, 
understanding the state of the system at any one given moment, depending 
on the system, may not be as critical as recognizing how quickly a system 
state is changing and the direction of its change. Analog displays have tra- 
ditionally presented direction information by the clockwise or counterclock- 
wise movement of an indicator or pointer, and rate of change by the rapidity 
of that movement. These features are often seen in airplane disaster movies, 
for example, in which the rapidly unwinding altimeter—the instrument that 
depicts an aircraft's altitude—conveys the seriousness of the situation. Some 
analog displays use vertical or horizontal “tapes” or lines to convey trend 
information. The lines move up or down or left or right to convey the direc- 
tion and rapidity of system changes. 

Digital displays do not necessarily present trend information better than 
do analog displays. A digital format that presents system parameters in 
Arabic numerals gives the operator precise parameter information. However, 
in the event of a rapid change, the numerals corresponding to the parameter 
would also change rapidly, and operators may not be able to quickly inter- 
pret the direction of change, that is, whether the parameters are increasing or 
decreasing. Yet, properly designed digital displays can present trend infor- 
mation in at least a comparable, if not superior way, to analog displays. These 
generally depict the nature and rate of the change pictorially to minimize 
operators’ time spent interpreting trend data, as in the illustration of the face 
to portray engine coolant temperature. 





Aural Information 


Visually presented information has one major drawback; operators must 
look at the information to receive it. If they are looking at displays of non- 
critical information, or engaged in other tasks and focusing elsewhere, they 
will not receive the information. Designers compensate for this shortcoming 
by adding aurally presented information to the presented information. 
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Because of the salience of aurally presented information—even inatten- 
tive operators receive the information—designers have usually relied on 
aurally presented information to rapidly communicate critical informa- 
tion to operators (e.g., Patterson, 1990; Edworth, Loxley, and Dennis, 1991). 
However, aurally presented information also has limitations in that the con- 
veyed information is less precise than visually presented information, and it 
can quickly distract operators and hinder their performance (e.g., Banbury, 
Macken, Tremblay, and Jones, 2001). 

The quality of aurally presented information is primarily influenced by a 
number of factors, 


* Conspicuity 
Distractibility 


* Uniqueness 


* Accuracy 


Relative importance 


Conspicuity 


To perceive aurally presented information operators must distinguish the 
critical sound from other sounds. Designers generally use of one of two 
methods to increase the conspicuity of critical sounds relative to those of 
other sounds, increasing volume or varying such sound elements as pitch, 
frequency, and rhythm. Patterson (1990) suggests increasing the volume of 
critical sounds by at least 15 dB over the volume of background noises to 
make them clearly audible. In environments in which the ambient sounds 
are fairly loud, this could make the aurally presented information quite loud, 
even approaching dangerous levels over extended periods. 


Distractibility 


Once aural information has been presented, continuing to present the 
sounds adds little additional information, and can distract operators and 
degrade their performance. The longer aural information continues to be 
presented and the more conspicuous the sound, the more likely the infor- 
mation will interfere with and degrade operator performance. On the other 
hand, Banbury et al. (2001) point out that after about 20 minutes of expo- 
sure the distracting effects of sounds are reduced. Unfortunately, exposure 
to interfering sounds for as long as 20 minutes can substantially degrade 
operators' ability to respond effectively in that interval. 

Aurally presented information should cease to be presented after opera- 
tors have received and understood it. However, many systems cannot rec- 
ognize when this has been accomplished. Too often, aural information first 
informs and then distracts operators. 
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Aural information can also distract and interfere with the work of opera- 
tors who were not targets of the initial information, especially in small 
operating environments such as locomotive cabs, ship bridges, or aircraft 
cockpits. Allowing operators to silence alerts might negate these disadvan- 
tages. However, as will be discussed shortly, systems that allow operators to 
silence alerts have other disadvantages. 


Accuracy 


Aurally presented information that is inaccurate or inconsistently useful 
will lose its value overtime, eventually failing to elicit operator attention. Yet, 
designers generally consider the consequences of missed alerts, where aural 
information is presented but operators do not respond, to be more critical 
to system safety than those of false alarms, where alerts are sounded but a 
response is unnecessary. As a result, designers tend to favor providing more 
rather than fewer alarms in a system to ensure that operators are informed 
of potentially important systems-related information. 

Further, designers set the threshold in these systems sufficiently low to 
ensure that critical events will elicit alerts, even if this results in noncritical 
events eliciting alerts as well. Unfortunately, doing so with sufficient fre- 
quency can expose operators to repeated false alarms, which has been found 
to reduce operator sensitivity to the alerts, and as Sorkin (1988) found, on 
occasion can even lead to outright operator silencing them altogether. 

In a January 1987 rail accident near Baltimore, Maryland, the locomo- 
tive engineer and brakeman, who were operating two freight locomo- 
tives, silenced an alarm they had considered distracting, a shrill whistle 
that sounded when the head locomotive passed a stop signal (National 
Transportation Safety Board, 1988). Neither operator noticed or responded to 
the stop signal, and the train consequently entered a track section that was 
reserved for an approaching, high-speed, passenger train. The passenger 
train then struck the freight locomotives, killing its engineer and 16 passen- 
gers. Investigators concluded that the aural alert would have informed the 
freight locomotive engineer and brakeman of their impending entry onto a 
prohibited track section. Investigators also found that the two operators had 
smoked marijuana before the accident and were impaired at the time. 


Uniqueness 


Designers create distinct sounds that are associated with different system 
elements, system states, or desired operator responses. Uniqueness charac- 
terizes the degree to which a sound is associated with specific system-related 
information. Operators learn to associate certain sounds with their corre- 
sponding system states so that when the sounds are heard the operators can 
quickly recognize their meaning, and will be unlikely to confuse the sounds 
with others. For example, emergency vehicles use distinctive sirens to alert 
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drivers in order to increase the likelihood that drivers will quickly recognize 
and respond to them. 

Aurally presented information can take a variety of forms. “Traditional” 
sounds such as bells, whistles, horns, and sirens are found on older equip- 
ment. Each sound can be readily distinguished from others and, if loud 
enough, could be heard over ambient sounds. 

As with visually presented information, modern digital equipment usually 
offers more flexibility in presenting aural information than does older equip- 
ment. Synthesized or recorded human voices that articulate simple voice 
messages can be used, in addition to traditional sounds (Stern, Mullennix, 
Dyson, and Wilson, 1999). Belz, Robinson, and Casali (1999) proposed using 
auditory icons, such as screeching tires, sounds that can be distinctly asso- 
ciated with particular system states, to enhance operator recognition and 
response to the sounds. 

Today, digital capabilities have expanded to the point that many automo- 
biles are equipped with navigation capabilities that can guide drivers to their 
destinations, taking traffic flow into account as well as proximity to other 
vehicles, whether they are in front, behind, and alongside. Drivers, if using 
vehicles not so equipped, can use their smartphones to provide navigation 
and other capabilities, with the ability as well to select from a number of 
voices, male and female, for example, with different accents or different lan- 
guages to direct them, so that the simple instructions, for example, “turn 
left in 70 meters,” can be quickly understood. As will also be discussed in 
Chapter 10, vehicles equipped with electronic devices, can, if needed, pro- 
vide accident investigators with useful information not only about selected 
routes, but also about braking, lane changing, and other data that can 
describe driver performance before an accident. 


Relative Importance 


A single event can precipitate multiple system warnings or alerts, each 
reflecting the state of a single system parameter rather than the event that 
led to the parameter state. In some systems, certain phenomena can elicit so 
many sounds and alarms from the effects of an event, rather than the precipi- 
tating event itself, that a cacophony of sounds is produced. When this occurs, 
the operator's ability to effectively evaluate the individual alerts in order to 
understand the phenomenon that led to the alerts, rather than the effects of 
the phenomenon on the system, is made considerably more difficult. 

Some systems inhibit both visual and aural alerts, without operator action, 
during critical operating cycles. This reduces the likelihood that noncritical 
alerts would distract operators during critical tasks, as occurred in the crash 
of a Boeing 757 off the coast of Lima, Peru, in October 1996. Investigators 
found that pitot-static tubes, critical components that are necessary to mea- 
sure airspeed, climb and descent speeds, and altitude, were blocked by a 
maintenance error, which led to the speed and altitude displays presenting 
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erroneous information to the pilots (Accident Investigation Commission, 
1996). After takeoff, numerous airspeed and altitude warnings and alerts, 
including low terrain, low airspeed, impending stall (the “stick shaker”), 
and wind shear, sounded. The alerts began within 5 seconds of each other 
and continued until impact. Each signaled a specific hazardous situation, 
but there was no alert that corresponded to the failure that had caused the 
multiple alerts—the blocked, and hence inoperative pitot-static system. The 
pilots were unable to determine the cause of the alerts. More important, 
operating at night and over water they could not visually estimate the air- 
plane’s airspeed and altitude. The alerts distracted the pilots, hindered their 
communications, and interfered with their ability to effectively diagnose the 
anomaly. 

Multiple warnings or alerts that sound simultaneously or in quick succes- 
sion often require the highest level of operator performance. Yet, they are 
often presented when operator workload tends to already be high because 
operators must, (1) continue to operate the system, (2) diagnose and respond 
to the anomaly, and (3) avoid causing additional damage. These tasks are 
challenging in combination, but when multiple alerts sound simultaneously 
during periods of high workload they can degrade performance. 

Some years later, an accident occurred that shared many of the characteris- 
tics of the 1996 Boeing 757 accident, an Airbus A-330 crashed into the Atlantic 
after the pitot tubes became blocked. Investigators attributed the blocked 
pitot-static tubes to ice crystals that formed after the airplane entered an area 
of adverse weather while the flight was at cruise altitude (Bureau d’Enquétes 
et d'Analyses pour la sécurité de l'aviation civile, 2012). The initial aural alert 
that the crew received pertained to the autopilot's disengagement, not to the 
blocked pitot tubes. This alert, which serves to inform the crew that manual 
airplane control is needed, is critical to ensure that pilots recognize that the 
autopilot is no longer operating, valuable information to help pilots recog- 
nize that they must address the alert and manually control the airplane. 
However, the underlying cause of the disengagement, the rapid alteration 
in measured airplane speed caused by the pitot-tube blockage, was not pre- 
sented. As a result, investigators noted that, 


Since the salience of the speed anomaly was very low compared to 
that of the autopilot disconnection, the crew detected a problem with 
this disconnection, and not with the airspeed indications. The crew 
reacted with the normal, learned reflex action, which was to take over 
manual control ... (Bureau d'Enquétes et d'Analyses pour la Sécurité de 
l'AviationCivile, 2012, p. 173) 


The pilots' failure to recognize that the airspeed they were perceiving was 
inaccurate led to their failure to recognize the cause of the problem that they 
encountered and their subsequent mismanaging of airplane control. The air- 
plane stalled and crashed into the ocean 4 minutes and 23 seconds later, kill- 
ing all 228 passengers and crew onboard. 
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Kinesthetic/Tactile Alerts 


Some have proposed presenting information through sensory modalities 
other than visual and aural ones to compensate for the limitations of pre- 
senting information in these modalities (e.g., Sklar and Sarter, 1999; Sarter, 
2000). Transport airplane designers use both kinesthetic and aural cues to 
simultaneously alert pilots to a critical event, an aerodynamic stall. A stall 
requires immediate pilot action or the airplane may crash. Just before reach- 
ing the airspeed that would precede an aerodynamic stall, pilots hear a par- 
ticular alert and feel a distinctive control column motion, sensations that 
are very difficult to ignore. However, as with aurally presented information, 
constant presentation of kinesthetically or tactually presented information 
can distract operators and degrade their performance. 





Controls 


Operators use controls to modify system state and system operation. Control 
design characteristics can influence operator performance and the likeli- 
hood of error, as can displays. Controls can take many shapes and forms, 
move in a number of directions, and be placed in a variety of locations. 
Automobiles, for example, employ at least three primary controls to enable 
drivers to direct their vehicles. The accelerator controls forward motion, the 
brake pedal slows or stops the vehicle, and the steering wheel controls lateral 
motion. Vehicles equipped with standard transmissions have two additional 
controls, a clutch and gearshift lever, for changing transmission gears as 
vehicle speed and engine rotation rates change. Other controls enable driv- 
ers to maintain selected speeds, and control windshield wiper speed and 
headlight brightness, sound the horn, and engage turn signals, for example. 
Further controls allow passengers and drivers to change window height, 
audio and video system characteristics, and vehicle interior temperature or 
ventilation levels. 

Investigators generally apply these criteria to assess the quality of control 
design, 


* Accessibility and location 
e Direction of movement and function 
e Shape 


e Placement 


Standardization 
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The quality of keyboard and touchscreen and other digital type controls, 
which are increasingly used in complex systems, is evaluated according to 
other criteria that will be addressed subsequently in this chapter. 


Accessibility and Location 


Accessibility, the ease with which operators can reach and manipulate 
desired controls, can influence the quality of operator performance. In sys- 
tems with relatively unlimited space, in which time to manipulate controls 
is not critical, accessibility will not substantially influence operator perfor- 
mance. However, in systems with space limitations, designers need to shape 
and locate controls so that operators can readily access them, irrespective 
of the operators’ physical characteristics such as arm length. Well-designed 
systems have controls that operators can reach and manipulate without 
moving far from their stations. 

Inaccessible, hidden, or obscured controls can delay operator response 
when time is critical and thus serve as antecedents to error. Large church 
or concert organs illustrate well-designed controls. Organists adjust their 
access to the controls by moving their seats, and use both their hands and 
feet to operate the controls, the keys, and the pedals. 


Direction of Movement and Function 


The direction in which a control moves should intuitively correspond to 
the direction of change in the corresponding component. Raising a control 
should increase an aspect of the system such as production rate, compo- 
nent height, or illumination level, while lowering a control should reduce it. 
Depressing a button should engage a component function while releasing 
the depressed button should disengage it. Controls that move in directions 
that are counterintuitive can become antecedents to error if operators actuate 
the control incorrectly after using similar controls that move in a “standard” 
direction. 


Mode Errors 


Systems with limited available space, as well as advanced electronic controls, 
often employ multifunction controls in which one device controls multiple 
system functions. Operators who are unfamiliar with or do not perceive the 
distinction among the various control functions may initiate a control action 
and an unanticipated system response, what has become known as a mode 
error (Norman, 1988). 

Multifunction controls can be designed to reduce opportunities for mode 
errors by giving operators unambiguous information or feedback regard- 
ing the system’s operating mode. The quality of the feedback is affected by 
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the same visual, aural, and kinesthetic factors discussed previously. Visually 
presented feedback should be sufficiently conspicuous to enable operators 
to receive the information. Aurally presented information is likely to be 
the least confusing, but operators will tend to ignore aural information if 
presented repeatedly. 

Investigators concluded that the pilots of an Airbus A-320 that crashed 
short of the runway at Mont St. Odile, France, in 1992, committed a mode 
error while preparing to land (Commission of Investigation, 1993). A single 
control, a knob that turned clockwise or counterclockwise to increase or 
decrease the rate of change in the desired mode, also controlled both the 
airplane’s descent rate and its flight path angle. Pilots selected the mode 
by either depressing or pulling the knob and then turning it to establish 
the desired descent rate or descent angle. Incorrectly controlling the knob 
engaged the mode other than the one intended. 

Investigators concluded that the pilots had inadvertently selected the 
wrong mode, and established a descent rate that was triple the typical rate, 
believing that they had commanded a moderate descent angle. Because of 
the dual purpose of the control knob, and ambiguity in the information pre- 
sented regarding the descent rate that they had engaged, the pilots were not 
aware of their error and then failed to notice the rapid descent to the ground. 


Shape 


Controls can take a number of forms, designs, and shapes, such as knobs, 
buttons, wheels, switches, levers, or pedals. Designers may shape a control 
to resemble a distinctive task or function. In some systems, regulators have 
mandated specific design characteristics. For example, the lever that extends 
or retracts airplane landing gear is required to be circular to reduce the pos- 
sibility of confusion with an adjacent control. By shaping the lever to cor- 
respond to the shape of the controlled component, the aircraft wheels, pilots 
can recognize the control by touch alone, minimizing the possibility of con- 
trol confusion. 

Control shape can play an important role in operator performance. In high 
workload or stressful situations, operators may not have the time to visually 
identify a control before manipulating it. Rather, they may locate and select 
controls by touch alone, without visual verification. In these circumstances, 
operators may find similarly shaped controls to be undistinguishable, and 
select the wrong control. 


Placement 


Controls that actuate different subsystems or have different functions (e.g; 
go fast and go slow), should not be placed near each other, and if so, should 
be shaped differently so that in the event that operators must engage them 
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FIGURE 4.2 

Identically shaped controls in which one rapidly speeds up and the other slows down the 
vessel’s engine, placed adjacent to each other, on the first row, center (speed up) and right- 
most (slow down) positions. In both cases, actuation resulted from depressing the buttons. 
(Courtesy of the National Transportation Safety Board.) 


quickly, they can identify them without having to visually verify that they 
have actuated the desired control to initiate the control operation desired. 
The effects of placing identically designed controls, with differing actuation 
results, adjacent to each other can be seen in the investigation of a marine 
accident (National Transportation Safety Board, 2011). 

In this accident, which was caused by a marine pilot’s late recognition of 
the need for a turn (influenced by his fatigue), the vessel he was piloting 
first collided into a vessel traveling in the opposite direction as his vessel, 
and then collided with a second, docked vessel. Just before the accident, the 
captain, seeing the impending collision, attempted to rapidly slow the ves- 
sel by actuating a control, a button that caused the engine to quickly slow. 
However, the button actuating that control was located adjacent to an iden- 
tical button that caused the engine to do the opposite of what the captain 
intended, speed up rapidly, the button that the captain actually depressed. 
Although investigators determined that at the time the captain actuated the 
control the accident could not be avoided, investigators faulted a design that 
was counter to the standards of good design (Figure 4.2). 


Standardization 


Operators have come to expect a certain configuration, shape, and direction 
of movement in the controls that they manipulate. Unfortunately, unless reg- 
ulators establish rules governing the design of both displays and controls, 
designers may create designs that suit their own rather than the operators’ 
needs. This can lead to differences in the shape of similar controls on compa- 
rable equipment. Those who have driven cars at night that are different from 
their own, and had difficulty locating and engaging the windshield wipers 
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or headlights because the controls were located in unexpected places, have 
witnessed the errors these control differences can create. 

So long as operators interact with only one type of equipment, nonstan- 
dard control shapes, locations, and directions of movement will not cre- 
ate antecedents to errors. However, operators interacting with comparable 
equipment that have different controls and displays could, out of habit, move 
a control incorrectly or direct the wrong control when alternating between 
equipment types. If operators repeatedly reach one location to access a con- 
trol, or move it in a certain direction to accomplish an action, they will likely 
continue these movements on different equipment, even if the movements 
produce unintended consequences. 

Some years ago, the National Transportation Safety Board found that the 
rate at which pilots failed to extend the landing gear before landing was 
higher among pilots of aircraft that had been designed and built by one 
manufacturer than with pilots of comparable aircraft of other manufacturers 
(National Transportation Safety Board, 1980). The NTSB attributed this dif- 
ference to the location of the landing gear and flap controls. Controls in the 
cockpits of the airplanes with the higher gear up accident rates were located 
in different locations than were controls on most other aircraft. Investigators 
concluded that pilots who had operated other aircraft would inadvertently 
reach for and select the “wrong” controls occasionally, actions that would 
have been appropriate on those aircraft. 

Unfortunately, there is no short-term solution for a lack of standardized 
controls and displays. Designers could reduce the role of this antecedent 
to error by adhering to a common control and display design standard. 
However, a transition period would be needed to implement a standard 
to prevent operators from being confused by what may be a new design. 
Equipment already in service will likely continue to remain in service until 
it is no longer economically feasible to do so. In order to standardize com- 
parable controls and displays, the time needed to introduce new or rede- 
signed equipment into complex systems and to train operators to use the 
new designs may be considerable. Unless regulators require standardizing 
the controls and displays in the systems they oversee, standardization will 
be unlikely. 


Keyboard Controls 


In older systems, operators often needed to exert considerable physical 
force to manipulate controls. Today, however, systems use keyboard con- 
trols, either with the familiar QWERTY format derived from the typewriter 
keyboard, a variant, or graphic interfaces on screens to actuate system con- 
trols. Operators using keyboard controls are physically able to control the 
system without error, so long as they dont inadvertently strike the wrong 
key. Without effective feedback from the system, operators may incorrectly 
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believe that they have actuated the correct keyboard controls even if they 
have not. Highly automated systems largely rely on keyboards with well- 
separated keys that minimize slips when operators manipulate them by 
touch alone, and place the keyboards in a location that minimizes fatigue 
over extended use. 

In addition, in contemporary systems graphic user interfaces and touch 
screens have increasingly been implemented as system controls. These are 
less likely to lead to inadvertent operator errors than are keyboards, as oper- 
ators must visually determine the selections they make through a display on 
the screen. Other characteristics of automated systems and their effects on 
operator performance are discussed in more detail in Chapter 15. 





Summary 


The manner of presenting system-related information to operators can 
affect their understanding of the system state. Information that involves 
a high degree of precision is generally presented visually. The number of 
displays, their conspicuity, organization, and portrayal of trends in system 
performance influences operators” ability to obtain and interpret system 
information. Information that is difficult to access and interpret can lead to 
misinterpretations and errors. 

Information that requires immediate attention, independent of opera- 
tors’ focus, is generally presented aurally. Volume, precision, and conspicu- 
ity influence how well operators receive and comprehend the information. 
Continued presentation of aural information can distract and interfere with 
an operator's ability to concentrate, perform other tasks, and perceive other 
aurally presented information. Sounds should be distinctive and associated 
with system states or required operator actions to continue to be meaningful. 
Sounds that are inconsistently associated with system information or opera- 
tor response will lose their meaning over time. 

The design of controls that operators use to alter or modify system oper- 
ations can affect their performance. Controls should be readily accessible 
and move in the direction that corresponds to the direction of change 
in the associated system parameter. Control shapes should be read- 
ily distinguishable from one another, particularly if adjacent. Controls 
with different functions should be shaped differently, and placed away 
from each other to reduce the likelihood of operators inadvertently 
actuating the wrong controls. Over the long term, standardization of 
control and display features will reduce the potential for confusion and 
errors among operators who work on comparable, but nonstandardized 
equipment. 
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DOCUMENTING EQUIPMENT 


Photograph, video record, or otherwise capture a record of dis- 
plays and controls in the operating environment of the equipment 
involved in the event. If this is not possible, refer to equipment 
handbooks for operating station diagrams, as necessary. 


Use comparable systems or a system simulator, noting dif- 
ferences between the two, if the equipment was excessively 
damaged as a result of the event. 


Interview designers to obtain information about the philoso- 
phy that guided the display and control design. 


Interview designers, instructors, and operators, and refer 
to operating manuals, to obtain information on differences 
between designers’ and instructors’ intentions and operator 
practices. 

Refer to ergonomics handbooks for guidance, if necessary, 


when evaluating display or control design features (e.g. 
Sanders and McCormick, 1993; Ivergard, 1999). 


DISPLAYS 


Document the number of displays and their locations, and 
note the displays that operators use to understand the event, 
compared to the total number of displays presented nearby. 


Note how closely the logic of the organization corresponds to 
the way operators access displays or their associated controls. 


Contrast the color, brightness, and data size in the display to 
comparable features in adjacent displays to determine display 
conspicuity. 

Identify display colors, pictures, diagrams, design, or other fea- 
tures that affect data interpretability and if necessary, refer to 
operating manuals and handbooks to understand the meaning 
and relevance of the displayed data. 


Determine the portrayal of direction and rate of changes in 
parameter trend information. 


AURALLY PRESENTED INFORMATION 


Measure sound volume, duration after initial presentation, 
volume of ambient sounds, and changes in features of the 
sounds of interest with changes in component status. 


Document features of aurally presented information among 
sound elements such as volume, pitch, frequency, and rhythm. 
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Determine the meaning of each aural warning, alert, or other 
sound, and its association with specific system states. 


Identify sounds that call for a specific action or response, the 
information they convey, and the corresponding required or 
advised operator action or response. 

Measure the length of time alerts sound before they are 
silenced, either by the equipment or by the operators. 
Document operator actions needed to silence alerts, the system 
state that will resume the alert, and the length of time needed from 
the first sounding of the alert to its operator-initiated silencing. 
Interview operators to determine the actions they have taken 
to silence the alerts. 


CONTROLS 


Document the location and positions of system controls, their 
accessibility to operators, and obstructions to accessibility. 
Determine the direction of movement of the controls and 
their correspondence to the direction of change in component 
parameters. 

Determine the feedback operators receive concerning changes 
in system state. 

Assess shapes, sizes, and distances among controls. 
Determine differences among control parameters among 
comparable systems if operators interact with equipment of 
different manufacturers. 

Document the accessibility, sizes, and distances among 
keyboard controls. 
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The Operator 











Each person—the butcher, the parent, the child—occupies a differ- 
ent position in the world, which leads to a unique set of experiences, 
assumptions, and expectations about the situations and objects she or he 
encounters. Everything is perceived, chosen, or rejected on the basis of 
this framework. 


Vaughan, 1996 
The Challenger Launch Decision: Risky Technology, 
Culture, and Deviance at NASA 





Introduction 


Our uniqueness as individuals—the way we were raised and educated, our 
work experiences, and genetic makeup, affect the way we perceive the world 
and act upon it. Because of these differences two operators, encountering 
identical system states, could perceive them differently, even if they have had 
identical training and similar backgrounds. 

In the past, investigations of error largely focused on the operator, often to 
the exclusion of other system elements that may have contributed as much, 
if not more, to operator errors than factors related to the operator who had 
committed the errors. For example, checklists that were developed to guide 
investigations of error, many of which are still widely used today, focus pri- 
marily on characteristics of the operator rather than other system elements 
(e.g., International Civil Aviation Organization, 1993). 

The role of the operator has changed over the years, as technology has 
advanced and been increasingly implemented into complex systems. With 
this change, the role of the operator has changed as well as he or she increas- 
ingly interacts with the system through technology. This has also changed 
the type of errors operators commit from action errors to more cognitive 
ones (Coury et al., 2010). The approach to investigating error advocated in 
this text considers operator-related factors within a broad view of error. It 
does not minimize the influence of operator factors in incident and accident 
causation; these can, and do, affect performance and lead to error. Rather, 
it puts the error in the context of the system in which he or she operates, 
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with consideration of antecedents that are associated with both action and 
cognitive errors. 

The central role of the operator in complex systems operations requires 
investigators to recognize the role of operator factors in system performance, 
factors that are associated with both action and cognitive type errors. This 
chapter will focus on operator-related antecedents to explain the effects of 
these factors on operator performance. 





Physiological Factors 


Operator-related antecedents can be categorized into one of two general 
classes: physiological or behavioral. Each includes antecedents with which 
most of us are familiar, having likely observed their effects in our own expe- 
riences, and each can affect operator performance over both the short and 
long term. 

Physiological antecedents can temporarily or permanently degrade opera- 
tor performance by impairing the operator or otherwise degrading his 
performance. The number of potential physiological antecedents that can 
influence operator error is sizeable, and numerous medical and physiological 
texts, journals, and articles have examined them. The major ways that physi- 
ological antecedents can degrade operator performance and lead to error 
will be reviewed, and data needed to determine if a relationship between 
the two exists suggested, but a full discussion of these antecedents is beyond 
the scope of this text. 


General Impairment 
Illness 


Because operators must interpret data, recognize situations, anticipate system 
performance, and make decisions to effectively oversee system operations, 
any condition that degrades their cognitive skills could serve as an anteced- 
ent to error. Physiological antecedents can increase reaction times, interfere 
with cognition, and limit recall ability, among other impairing effects. 

An operator is impaired when the quality of his or her performance has 
been degraded to a level below that needed to function effectively and safely. 
Operators are expected to notify their superiors when they are unfit for duty 
so that others can be found to serve in their place. However, many do not rec- 
ognize the subtle effects of degrading factors on their performance and will 
report to work when they are ill or otherwise unfit for duty. They remove 
themselves from system operations only when their illness or discomfort 
is self-evidently impairing, without realizing the adverse effects of subtle 
impairment from the illness on their performance and on safety. 


The Operator 87 


Researchers have found that even mild illness and discomfort, well below 
what many consider impairing, may still degrade performance and create 
antecedents to error. Smith (1990) examined the effects of two fairly minor 
illnesses, colds and influenza, on performance, ailments that account for 
what he termed a substantial proportion of all consultations in general medi- 
cal practice. He measured cognitive skills and reaction times of volunteers 
who had been infected with a cold or influenza virus, and compared them 
with those who had been given a placebo. Those who were infected demon- 
strated significantly poorer cognitive performance and reaction times than 
those who were not infected, and many of those demonstrating degraded 
performance were in the incubation periods of their illnesses and thus, 
asymptomatic. 


Medications 


The potential effects of both prescribed and over-the-counter medications 
on operators vary according to the potency of the drug, the amount taken, 
the time since taking the drug, the rate at which the drug is metabolized, 
the presence of other drugs in the operators’ systems, and individual varia- 
tion in response to the drugs in question. The side effects of many drugs in 
and of themselves may adversely affect performance. For example, sedating 
antihistamines, found in many over-the-counter cold and allergy medica- 
tions, slow reaction time and cause drowsiness. Their effects on operator 
performance can last hours after being consumed. Weiler et al. (2000) found 
that the performance of automobile drivers in a driving simulator was as 
adversely affected by an antihistamine, diphenhydramine, found in over- 
the-counter cold medications, as it was by alcohol. 

In 1998, a commercial bus ran off of the road and struck a parked truck, 
after the bus driver had fallen asleep, killing him and six of the passengers 
(National Transportation Safety Board, 2000a). Toxicological analysis of 
specimens from the body of the driver revealed the presence of diphenhydr- 
amine and two other drugs, all contained in an over-the-counter preparation 
marketed for the treatment of colds and allergies. The amount of the drugs 
and their rates of metabolism indicated that the driver had likely taken the 
medication several hours before the accident. Investigators concluded that 
the drug exacerbated effects of two additional antecedents, an irregular 
sleep-work cycle, and the time of day, to cause the driver to fall asleep while 
driving. 

In a retrospective study, the National Transportation Safety Board exam- 
ined fatal accidents in a variety of transportation modes in the United States 
(National Transportation Safety Board, 2000b). Prescription medications 
were found in the bodies of over 21% of the general aviation pilots killed in 
aircraft accidents in 1 year, and in many of the bodies of operators killed in 
accidents in other transportation modes as well. Investigators concluded that 
both prescription and over-the-counter medications had impaired the opera- 
tors and led to the accidents. 
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Operators whose performance was adversely affected by prescribed medi- 
cations may have ingested multiple medications. In that case, there may be 
difficulty determining whether the effects of the medications were additive, 
where the effects of each medication added to those of the other medications 
taken, or interactive, where the influence of the medications on performance 
may have differed from more typical side effects because of the influence of 
the other medications in the person's system. While many prescribed medi- 
cations, such as blood pressure drugs, have little effect on cognitive perfor- 
mance, others, particularly opiate pain drugs and antianxiety medications, 
known as benzodiazepines (e.g., Xanax, Prozac), have been demonstrated to 
adversely affect cognitive performance (e.g., Allen et al., 2003; Zacny, 2003). 
Potentially adverse effects of drugs on performance can be found in such ref- 
erences as the Physicians’ Desk Reference, Internet information that manu- 
facturers have made available, or published research. 

Many over-the-counter medications carry generalized warnings on their 
labels about the hazards of driving or operating heavy machinery after use, 
but these warnings are often written in small font, and many users neither 
read the warnings nor recognize the need to apply them to their own situa- 
tions. The extensive promotion of these drugs, their widespread availability 
and use, and the frequent lack of awareness of their side effects, increase the 
likelihood that operators will use them without recognizing their potential 
to impair and degrade performance. Prescription medications, which are 
required to have adverse effects listed and provided to patients, also may 
have their side effects ignored by users. Further, when the medications are 
dispensed, the physicians prescribing the medications and the pharmacists 
dispensing them may not inform users of potential adverse effects. Often 
people need to determine medication effects on their own, either by read- 
ing the information provided by the pharmacy or through internet sources. 
Many may be unaware of the effects on performance of the medications they 
are taking. This may be an issue in the performance of operators in indus- 
tries where few operators are aware of medication side effects or the need to 
attend to them, or in industries in which use of medications may be prohib- 
ited, and hence their use can lead to job loss. In those instances, operators 
may keep their medication use, and/or medical condition hidden from their 
employers. Investigators should also recognize that medication use, whether 
prescribed or over the counter, may indicate an underlying medical condi- 
tion that itself could be impairing. In those instances, medical records from 
the prescribing physicians are likely to be helpful to explain the nature of the 
condition that led to the medication use. 


Alcohol and Drugs of Abuse 


The effects of few, if any drugs, have been studied as much as those of alco- 
hol. Even small amounts of alcohol can impair performance in a variety of 
cognitive and motor tasks (e.g., Ross and Mundt, 1988; McFadden, 1997). 
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A direct relationship has been established between the amount of alcohol in 
the bloodstream, measured by blood alcohol content or BAC, and the extent 
of impairment. The higher the BAC, the more impaired the person. In most 
of the United States, 8% BAC is considered impairing for automobile driv- 
ers, but lower levels, typically 5% BAC is considered impairing by medical 
researchers. Unusually high BAC concentrations, say 20% BAC or higher in 
an individual who is still able to function at some level, may indicate alcohol- 
dependency or addiction. 

Those addicted to alcohol or other substances may experience withdrawal 
after a period of abstinence of even a few hours, withdrawal that can also 
impair performance (Tiffany, 1999). For example, cocaine, a highly addic- 
tive drug (National Institute on Drug Abuse, 1999), is a stimulant. After its 
effects have worn off cocaine users will likely be fatigued, particularly if 
they had taken the drug at a time when they would ordinarily have been 
asleep. Because fatigue impairs cognitive performance, the effects of with- 
drawal from sustained use of cocaine—effects that include mood alteration 
in addition to sleep disruption—can create antecedents to error. 

Investigators determined that the pilot of a regional aircraft that crashed 
on approach to Durango, Colorado, had been fatigued after ingesting 
cocaine the night before the accident (National Transportation Safety Board, 
1989). He and the first officer were flying a challenging approach through 
the Rocky Mountains and were about to land when they struck the ground 
several miles from the runway. Postmortem toxicological analysis of speci- 
mens from the captain’s body found benzoylecgonine, cocaine’s principle 
metabolite. Given the amount of the drug and its metabolite that were found, 
and the rate of cocaine metabolism, investigators determined that he had 
consumed the drug between 12 and 18 hours before the accident. Because 
the accident occurred at 6:20 p.m. local time, he would likely have consumed 
the cocaine the night before, at a time when he would ordinarily have been 
asleep, thus disrupting his normal sleep pattern. Further, after taking the 
cocaine, he would have had been expected to have encountered difficulty 
sleeping until the effects of the drug had worn off. 

Investigators concluded that the captain’s piloting skills “were likely 
degraded from his use of the drug before the accident” and that he was 
likely experiencing the effects of withdrawal, including, “significant mood 
alteration and degradation, craving for the drug, and post-cocaine-induced 
fatigue” (p. 29). The findings demonstrate that even hours after someone has 
consumed drugs and the drugs subsequently metabolized, performance can 
be degraded. 

Other accidents have also shown the adverse effects of illegal drug con- 
sumption on operator performance. For example, in the 1991 train acci- 
dent discussed in Chapter 4, in which two freight locomotives had passed 
a stop signal and inappropriately entered a track reserved for a passenger 
train, investigators determined that shortly before the accident the engineer 
and brakeman had ingested marijuana while operating the locomotives. 
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Investigators concluded that they proceeded beyond the stop signal because 
they were impaired from the effects of their marijuana consumption. 

As with alcohol, high levels of a drug or its metabolites may indicate that the 
operator is a drug abuser, that is, a long-term user of a drug, or is a drug addict. 
If an operator is suspected of abusing medications, pharmacy records of pre- 
scribed medications may reveal a pattern of use over time. The operator may 
have approached several physicians and obtained prescriptions from each. 
The operator also may not have informed his or her employer of either the 
medication use, or the condition for which the medications were prescribed. 

Other information, such as records of convictions for driving while under 
the influence of alcohol or drugs, may also suggest a pattern of substance 
abuse (see also Chapter 12). In the United States, the Federal Aviation 
Administration requires pilots to report such infractions, and reviews the 
driving records of all pilots to learn of such offenses, regardless of their self- 
reports (McFadden, 1997). A substance abuse specialist evaluates all pilots 
with two or more convictions (and some with one), to determine whether 
they are chemically dependent. Only after these specialists have reviewed 
the operator's history and concluded that he or she would likely refrain from 
future drug or chemical use, does it grant the medical certificate needed to 
serve as a pilot. 

Company-maintained personnel records may contain information reflect- 
ing an operator's history of substance use. Prolonged absences, or absences at 
the beginning and end of work weeks or work periods, may indicate chemi- 
cal use. Performance appraisals may also show marked changes in work 
habits or work performance—another indicator of chemical dependency (see 
Chapter 12). Depending on the industry, regulators may require operators 
to provide the results of regular medical examinations and their medication 
use. These records should provide investigators with considerable informa- 
tion regarding medical and pharmaceutical antecedents to error. 


Specific Impairment 


Many of the tasks that operators perform require acute vision or hearing, 
or subtle senses of touch. Impairment in any of these sensory modalities 
may lead to errors. Operators in most complex systems are expected to dem- 
onstrate sufficient visual acuity to read displays from their control stations, 
see motion and depth, and distinguish among colors both within and out- 
side of the immediate environment. They should also be able to demonstrate 
sufficient aural acuity to identify various alerts, recognize electronic and 
voice communications and other system-related sounds, and determine the 
direction from which the sounds originated. 

Yet, operators may not always recognize their own impairment and even 
when they do, they may deliberately withhold that information if they 
believe that reporting the impairment could adversely affect their careers. 
Investigators acknowledged this in 1996 when a passenger train operator 


The Operator 91 


failed to stop the train he was operating at a red stop signal. His train struck 
another train ahead, killing him and injuring more than 150 passengers 
(National Transportation Safety Board, 1997a). Investigators learned that 
for almost 10 years, the operator had been treated for diabetes, and that he 
had undergone corrective surgery for diabetic retinopathy, an eye disease 
brought on by diabetes. They attributed his failure to stop to impaired vision; 
he was unable to distinguish the signal colors. Although required to do so, 
he did not inform his supervisors of his medical condition, and despite the 
impairment, he continued to serve as a train operator. 

Depending on the complex system and the sensory modality involved, 
impairment may be so subtle that neither operators nor their supervisors 
recognize it, becoming evident only in unusual or unexpected conditions. 
Investigators encountered this in an investigation of a 1996 air transport air- 
craft accident in which the captain, who was attempting to land at New York’s 
LaGuardia Airport, lost depth perception on landing, substantially dam- 
aging the airplane, although all onboard escaped serious injury (National 
Transportation Safety Board, 1997b). 

The captain had been intermittently wearing monovision contact lenses 
for several years without incident, lenses that corrected near vision in one 
and distant vision in the other eye simultaneously. However, in certain 
visual conditions, the differences in the contact lenses degraded his depth 
perception. The final moments of the flight path had been over water and 
through fog, conditions that obscured background features, until just above 
the runway. The reduced visual cues in the prevailing visual conditions, 
with the adverse effects of the monovision lenses, sufficiently reduced his 
depth perception to the point that he allowed the aircraft to descend too low 
and strike the runway. 





Behavioral Antecedents 


Behavioral antecedents, which develop from the operator’s near- or long- 
term experiences, can adversely affect performance. They can, for example, 
follow profoundly stressful events; such as the loss of an immediate family 
member. The grief and stress of people in these situations, and the effects of 
that stress on their performance, are understandable. 

Many have encountered the effects of behavioral antecedents at one time 
or another and can attest to their adverse influence. The effects they exert 
on the performance of an operator in question, however, may be different 
from that on another’s performance. Two behavioral antecedents, fatigue 
and stress, are of particular interest to error investigators. Others, that are 
company influenced, may also be important in terms of focusing on operator 
antecedents. 
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The Company’s Role 


Because of the role of the company in the conduct of its operations, ante- 
cedents that may appear to be related to the operator may be more correctly 
attributed to the company. Operators may commit errors because of skill or 
knowledge deficiencies, and these deficiencies may serve as the antecedents 
to the errors in question. But companies that employ the operators estab- 
lish minimum qualification levels, hire applicants whom they believe will 
meet those qualifications, and train and certify them as qualified to safely 
operate their systems. Consequently, because of a company’s role in over- 
seeing its operations, company antecedents and not operator antecedents 
may influence errors that result from a lack of operator knowledge or skills. 
Companies may also be considered to have influenced operator performance 
if company-established work schedules led to operator fatigue. If, however, 
operators engaged in personal activities that led to their fatigue, then the 
company would not be considered the source of the fatigue and the anteced- 
ent to error. This will be discussed more fully in Chapter 6. 


Fatigue 


Fatigue and its adverse effects on human performance have been studied 
extensively (e.g., Costa, 1998; Gander et al., 1998; Mitler and Miller, 1996; 
Rosekind et al., 1994). The research shows that fatigue degrades human per- 
formance and can contribute to or cause error. Dawson and Reid (1997a, b) 
found that those who had been awake as long as 18-27 hours continuously 
exhibited cognitive performance decrements that were equivalent to having 
a BAC of 5% or greater. 

In a study of accidents in several transportation modes, the National 
Transportation Safety Board examined the role of fatigue in transporta- 
tion safety (National Transportation Safety Board, 1999). As investigators 
conclude, 


Researchers have studied factors that affect fatigue, such as duration and 
quality of sleep, shiftwork and work schedules, circadian rhythms, and 
time of day. Cumulative sleep loss and circadian disruption can lead to 
a physiological state characterized by impaired performance and dimin- 
ished alertness. Fatigue can impair information processing and reaction 
time, increasing the probability of errors and ultimately leading to trans- 
portation accidents. (pp. 5 and 6) 


The importance of the effects of fatigue on cognitive performance of opera- 
tors, and their effectiveness in complex system operations, has increased as 
technology has advanced. As Fletcher et al. (2015) write, 


The nature of work-related fatigue has changed over time. For example, 
in predominantly agricultural times, work tended to be mostly physical 
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and fatigue was therefore likely to be largely physical in nature. After 
industrialization, work in many settings became repetitive and fatigue 
increasingly included psychological or cognitive components from 
factors such as time on task. When electricity became widely available 
and night work more common, sleep and circadian factors underlying 
fatigue became prominent. (p. 7) 


Fatigued operators can have particularly adverse influence on system 
safety as is evident from a substantial body of literature, albeit primarily 
focusing on highway drivers. Teran-Santos et al. (1999) found that drivers 
with untreated obstructive sleep apnea were more likely to be involved in 
highway accidents than comparable drivers who did not have the condition, 
after controlling for pre-accident alcohol and drug use, among other factors. 
Williamson et al. (2011) identified a link between fatigued performance and 
highway accidents, errors among hospital staff, and workplace injuries such 
as on construction sites. 

Investigators determined that fatigue led pilots of a DC-8 cargo flight to 
commit critical errors in a 1993 accident at the United States Naval Air Station 
in Guantanamo Bay, Cuba, in which the crew allowed the airplane to turn too 
steeply and strike the ground just before landing (National Transportation 
Safety Board, 1994). The three pilots had been sleep-deprived at the time of 
the accident. The captain had received only 5 hours of sleep in the previous 
48 hours and had been awake for 23.5 hours at the time of the accident. The 
first officer had been awake for 19 hours continuously and had gotten only 
10 hours of sleep in the 57 hours before the accident. 

In general, fatigue results from sleep loss in a 24-hour period, accumulated 
sleep loss over several days, disrupted circadian rhythms, and extended time 
performing a task. Most people need about 8 hours of sleep daily, give or 
take 1 hour. Thus, those who receive less 7 hours of sleep in any 24-hour 
period, or even 8 hours if they are typically need 9 hours of sleep, will likely 
be fatigued. The greater the difference between the number of hours a per- 
son needs to sleep and the amount that person obtained in the preceding 24 
hours, the more likely that person will be acutely fatigued. 

People can be chronically fatigued from accumulating a sleep debt, that 
is, not getting the amount of sleep needed to be rested, over several days. 
Again, assuming that most people need 8 hours of sleep nightly, those sleep- 
ing 6 hours a night or less for a week will likely be chronically fatigued by 
the end of the week, as any parent of an infant can attest. 

Most people maintain regular activity schedules and they tend to get tired 
and hungry about the same time each day. When their schedules are dis- 
rupted, for example, by transoceanic air travel or by shifting work schedules, 
they will have difficulty sleeping during the “new” night, what had been the 
day in the “old” time zone, no matter how fatigued they are. Several hours 
later, during the day in the “new” time zone, they may be unable to remain 
alert, despite coffee, tea, exercise, or other technique that had otherwise been 
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effective in restoring alertness. Thereafter, they may experience similar sleep 
disturbances upon their return, when they must readjust to the “old” time 
zone after having become acclimated to the “new” one. 

Many physiological and behavioral functions, including sleep cycle, diges- 
tion, hormonal activity, and body temperature, are regulated in approximate 
24-hour cycles. Disrupting these functions, as occurs to transoceanic airline 
travelers and shift workers who work days one week and nights the next, 
is also fatiguing because the change in schedules is more rapid than is the 
body’s ability to adjust. The experience is known as circadian desynchrono- 
sis, popularly referred to as “jet lag.” 

Circadian desynchronosis makes it difficult for people to sleep when they 
would otherwise be awake, and to be awake and alert during times when 
they had been asleep. Disrupted circadian rhythms lead to chronic fatigue, 
until the body adjusts to the new schedule and the person receives sufficient 
rest to compensate for the sleep deficits (e.g., Tilley et al., 1982). Because circa- 
dian rhythms do not adjust rapidly, a person whose circadian rhythms have 
been disrupted may be fatigued for days afterward, depending on the extent 
of the difference between the previous and current schedules. Therefore, 
although it may be early afternoon local time, a period in which operators 
would otherwise be alert, operators experiencing circadian desynchronosis 
may still be performing as if it were 3:00 a.m. Dawson and Fletcher (2001) 
and Fletcher and Dawson (2001), studied the effects of circadian desynchro- 
nosis on employee performance, and developed a scheduling model, which 
considers circadian effects, to schedule duty times of shift workers or trans- 
oceanic workers to minimize their disruptive effects. They found that con- 
sidering circadian factors in scheduling worker’s activities can reduce the 
effects of circadian disruptions. 

Certain medical conditions and medications can also be fatiguing. Sallinen 
and Hublin (2015) noted that sleep disorders such as obstructive sleep 
apnea, and pain-related conditions, are fatiguing. Untreated sleep apnea, a 
condition caused by blockage of a person’s airways while sleeping, results 
in numerous awakenings, of which the person may be unaware, because of 
the inability to breathe. Akerstedt et al. (2011), noted that about 5% of the 
adult general population has sleep apnea, about 576-1076 has restless leg 
syndrome, and about 3.976 has periodic limb movement, all medical condi- 
tions that lead to fatigue. In addition, prescribed medications used to treat 
restless leg syndrome, pain, anxiety, and insomnia, among others, are either 
fatiguing or lead to decrements in cognitive performance that resemble that 
of fatigue. 

Further, the quality of sleep is not constant; people sleep most deeply 
between 3:00 a.m. and 5:00 a.m. in their local time zones. They also experi- 
ence an equivalent decrease in alertness 12 hours later, between 3:00 p.m. 
and 5:00 p.m. local time. With these phased changes in sleep quality, the like- 
lihood of committing errors also changes. Monk et al. (1996) found that the 
number of errors committed in a variety of performance measures, errors 
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that can be directly related to operator performance and to accidents, var- 
ies with the time of day. In a study of on-the job injuries in several factories, 
Smith et al. (1994) found that night shift workers were injured on the job sig- 
nificantly more often than were day shift workers who performed the same 
tasks for the same employers. The evidence demonstrates that time of day 
can affect performance, and the times in which operators are most likely to 
commit errors occur when they would otherwise be in their deepest sleep 
cycles, between 3:00 and 5:00 a.m., and the afternoon correlate of those hours, 
between 3:00 and 5:00 p.m. 

Alertness is critical to error-free performance in complex systems and 
fatigue has been demonstrated to degrade operator performance in those 
cognitive skills that are most needed for operator effectiveness. Gunzelmann 
et al. (2011) and Lim and Dinges (2008) showed that fatigued individuals have 
difficulty with sustained attention. Lim and Dinges (2010) found that the 
cognitive responses of fatigued individuals slowed, that is, they took longer 
to notice environmental and situational features over that of non-fatigued 
ones. Akerstedt (2007) found that critical aspects of cognitive performance, 
such as vigilance, memory, and reaction time, among others, were worse 
among fatigued individuals than it was for those who were adequately 
rested. Wickens et al. (2015) observed that complex cognitive performance, 
such as mental arithmetic and critical reasoning, declined with extended 
sleep deprivation. Performance decrements were found to be worst during 
subjects’ circadian lows. 


Causes of Fatigue 


For our purposes, fatigue results from operator-related antecedents or orga- 
nization- or regulator-related antecedents. Medical conditions that the oper- 
ator is aware of but does not report to his or her company or the regulator, 
if required to do so, are an example of a type of operator-related antecedent. 
However, if the company or the regulator is aware (or should be aware) of the 
adverse influence of fatigue on cognitive performance and does not require 
operators with sleep apnea or other medical condition to be diagnosed and 
treated for the condition, then the antecedent would be considered company- 
or regulator-related. As information about the deleterious effects of fatigue 
and fatiguing medical conditions has increased, companies and regulators 
have increased their role in requiring those in safety-sensitive positions with 
these conditions to be diagnosed and treated for the conditions. 

Otherwise, operators who, for example, are fatigued because they remained 
awake longer than they had planned to before going on duty, would be con- 
sidered to be responsible for the antecedents. If they did so in order to watch 
a film or an event on television, for example, this would almost be considered 
a violation rather than an error antecedent. However, if they had insufficient 
sleep for reasons that had little to do with their volition, such as infant care 
or brief illness, they deserve more consideration, but nevertheless, must be 
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considered to be the source of the antecedent if they were fatigued as a result 
of their situations, and did not alert their supervisors to that effect. 

Companies and organizations can be responsible for the antecedents of 
an operator's fatigue if, as noted, they did not require their operators to be 
treated for fatigue-inducing medical conditions, prohibit their use of impair- 
ing over-the-counter and prescribed medications, or if they created fatiguing 
work schedules. As Fletcher et al. (2015) note, the nature of complex systems 
today calls for 24-hour operations. These systems are simply too expensive, 
and the societal costs of their nonoperation is too high, to allow them to 
cease operations for any length of time. Internationally operating aircraft 
and vessels, nuclear power stations, and chemical refineries, for example, 
cannot avoid nighttime operations without causing significant disruption to 
themselves and to society in general. 


Investigating Fatigue 


Unlike medical conditions, where medical records describe diagnoses, or 
alcohol-related impairment, where blood alcohol level provides evidence 
of the degree of intoxication, fatigue is a particularly challenging metric to 
assess. As Price and Coury observe, “historically, fatigue has been notori- 
ously difficult to define and operationalize” (2015, p. 86). Because no physi- 
cal measure of fatigue can be taken, investigators must assess the degree of 
fatigue indirectly. They do this by assessing evidence for fatigue, relating 
it to the type of error the operator committed, and determining the likeli- 
hood of other antecedents accounting for the error (Price and Coury, 2015; 
Strauch, 2015). 

Investigators determine that an operator’s error was the result of fatigue 
by first establishing that the operator was fatigued. Medical records that 
demonstrate that an operator has an untreated, fatigue-producing medi- 
cal condition, would be sufficient to establish that he or she was fatigued. 
Similarly, evidence of the use of a sedating medication would also be suf- 
ficient. Absent such evidence, investigators establish the presence of fatigue 
by examining the quantity, regularity, and quality of the operator’s sleep in 
the period before the accident. Ideally, a week’s worth of sleep/awake times 
would establish beyond question the quantity and regularity of someone’s 
sleep, but most people have difficulty remembering more than a few days 
previously what times they went to bed and what time they arose. Therefore, 
investigators typically ask operators to note their sleep/wake times for 72-96 
hours before an accident. This record will establish whether an operator 
was subject to circadian disruption, and whether he or she got the desired 
8 hours, plus or minus 1 hour, of sleep. Irregularity in sleep schedules, and 
sleep times less than the person's regular sleep hours serve as evidence of 
fatigue. Obviously, the greater the deficit from 8 hours, the greater the irreg- 
ularity in sleep/wake times, and the greater the cumulative deficit over time, 
the more likely the operator was fatigued. In addition, Price and Coury (2015) 
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highlight the importance of documenting sleep quality. An operator who 
has accrued sufficient sleep (i.e., around 8 hours), with regularity in the time 
before an accident, but whose sleep was diminished by noise, interruptions, 
high temperature, and so on, will be considered to have received insufficient 
quality sleep and hence, to have been fatigued. 

Once an operator has been identified as having been fatigued at the time of 
the accident, investigators then determine whether the error or errors were 
consistent with someone who was fatigued. In general, we look for cogni- 
tive errors in which the operator misdiagnosed something, or was late in 
doing a previously completed task, that is, was late or improperly performed 
a cognitive task that he or she had completed effectively beforehand. For 
example, failure to properly diagnose something, whether a component fail- 
ure or a navigation error, can be the result of any of several factors. A person 
who is fatigued, however, will likely have difficulty shifting attention among 
possible explanations or components, while focusing excessively on a single 
item. Because proper diagnosis requires an understanding of a system and 
its subsystem operations, effectively understanding the nature of a malfunc- 
tion within the system calls for the operator to rapidly examine the system 
and recognize the symptoms of the malfunction. Because cognitive activity 
is slowed when someone is fatigued, evidence of such slowing in cognitive 
performance would be consistent with someone who is fatigued. Similarly, 
someone who fails to quickly recognize a change in situational cues, or who 
is late is performing a task because he or she was late to recognize the need 
to perform the task, would provide evidence of being fatigued. On the other 
hand, action errors are typically not related to fatigue. Someone who turns 
on one switch while intending to activate another, adjacent one, has commit- 
ted an error that may or may not be influenced by fatigue. 

Because the types of cognitive errors prone to fatigue can also be affected 
by other antecedents, it is necessary to exclude other potential antecedents 
that could also account for an error to determine conclusively that fatigue 
led to an error. These include shortcomings in training, oversight, selection, 
and procedures. For example, an operator who misdiagnosed the cause of 
a component failure may have lacked the knowledge of the component and 
its relationship to the system to enable him or her to effectively understand 
its cause. This lack of knowledge must be excluded as a potential error ante- 
cedent, along with other antecedents that can explain the misdiagnosis in 
order to identify fatigue as the error’s antecedent. Only when all potential 
antecedents of the error have been ruled out, and fatigue is the only plausible 
antecedent remaining, can one confidently identify the operator’s fatigue as 
the antecedent of the error in question. 


Preventing Fatigue 


As noted, the cause of an operator’s fatigue may lie either with the opera- 
tor himself or herself, or with the company or regulator. In either instance, 
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companies or regulators can or have undertaken rule changes, scheduling 
changes, and education activities, among others, to mitigate opportunities 
for operators to oversee system operations when fatigued (e.g., see Gander, 
2015). For example, after a 2009 airplane accident that killed all 49 passengers 
and crew onboard and one person on the ground (National Transportation 
Safety Board, 2010), the U.S. Federal Aviation Administration changed its 
hours of service rules for pilots to account for duty time served during 
pilots’ circadian lows, that is, times when they would have ordinarily have 
been asleep. Pilots who worked a period of time that would ordinarily not 
have been fatiguing, were required to have additional rest hours if that 
duty time was served at night (FAA regulation 14 CFR Parts 1-27). Many 
regulators of complex systems require operators to accrue sufficient rest 
after their duty periods, but few account for changes in the schedules of 
their duty periods, or for rapid transmeridian time zone changes, as is the 
case with long-haul transport pilots. Consequently, the rules may allow 
operators to accrue what would otherwise be adequate rest periods, but 
because of potential shift changes that lead to circadian disruption, for 
example, shifting from day shift to night shift, the operator will likely be 
fatigued in the immediate nights following the schedule change, as the 
body takes several days to adjust the circadian rhythms to such dramatic 
changes. 

The Federal Aviation Administration, in its revised hours of service rules, 
encouraged airlines to adopt fatigue risk management systems, which it 
explained is “a data-driven process and a systematic method used to con- 
tinuously monitor and manage safety risks associated with fatigue-related 
error” (14. CFR 171.3). Fatigue risk management systems allow companies 
the flexibility to develop schedules of work that address the unique risks of 
their own operations, based on data that the company must collect and ana- 
lyze. Researchers have described the benefits of such programs (e.g., Dawson 
et al., 2012; Fletcher et al., 2015), and implementing a fatigue risk manage- 
ment system or similar program that identifies and addresses the risks of 
operators being fatigued while on duty can reduce, if not eliminate, the role 
of the company or regulator as an antecedent of an operator’s fatigue-related 
error. 


Stress 


The effects of stress on performance have been studied extensively. 
Definitions of stress vary, but the definition by Salas et al. (1996) will be used 
presently. They define stress as, “a process by which certain environmental 
demands evoke an appraisal process in which perceived demand exceeds 
resources and results in undesirable physiological, psychological, behavioral 
or social outcomes” (p. 6). 

The effects of individual stressors depend largely on the person and 
how stressful he or she perceives them. Stressors that are perceived to be 
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moderate can enhance performance by counteracting the effects of boredom 
or tedium in settings in which little changes (Hancock and Warm, 1989). 
However, stressors considered severe can degrade performance. 


Person-Related Stress 


Personal stress, influenced by circumstances unrelated to an operator's job 
that cause “undesirable physiological, psychological, behavioral or social 
outcomes,” are differentiated from system-induced stress. Operators may 
encounter more than one stressor simultaneously, both person- and system- 
related, and their performance may be affected by both. The more stress- 
ors a person experiences, the more likely that person’s performance will be 
degraded by their effects. 

Person-related stressors include marital breakups, the illness or death of 
family members, or disruptions to routines such as a move or the depar- 
ture of a household member. The effects of personal stress on an opera- 
tor’s performance can be seen in a marine accident that occurred when a 
tugboat pushing a barge on the Delaware River, in Philadelphia, ran over 
a tour vessel (National Transportation Safety Board, 2011). Two of the pas- 
sengers on the tour vessel were killed in the accident. The operator of the 
tug/barge, the mate, was unable to see the tour vessel because he was 
using his cellphone and laptop computer, from a lower level wheelhouse, 
while operating the vessel. As investigators describe, the mate, who was 
operating the tug/barge, had told a company official after the accident that 
... “he had been ‘consumed’ with dealing with this family crisis; medical 
records obtained by the National Transportation Safety Board confirmed 
that the mate’s child, who was undergoing a scheduled routine medical 
procedure that day, had suffered a potentially life-threatening complication 
less than an hour before the mate went on duty.” (p. 17) Investigators held 
him responsible for causing the accident, but it is nonetheless likely that the 
stress of learning of his son’s life-threatening condition affected his ability 
to recognize that he needed to inform company supervisors that he could 
not safely operate the vessel because of the stress that he was experiencing. 
Instead of properly operating the vessel, he was below a deck that would 
have enabled him to view the vessel’s forward path, and was talking on his 
cell phone and using his laptop, presumably to obtain more information 
about his son’s condition. 

Person-related stressors may not necessarily be negative; they can result 
from what most consider happy occasions. Alcov et al. (1982) compared the 
accident rates of U.S. Navy pilots who had experienced stressful life events 
to those who had not, events that included marital problems, major career 
decisions, relationship difficulties, job-related problems, as well as impend- 
ing marriage and recent child birth in the immediate family. Pilots who had 
experienced stressful events had sustained higher accident rates than pilots 
who had not. Despite these findings, it is important to recognize that the 
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mere presence of person-related stressors does not imply that an operator’s 
performance was adversely affected by stress because of the noted individ- 
ual variations in reaction to stressors. 


System-Induced Stress 


Operators may find unexpected system events to be stressful, such as dis- 
plays with information that cannot be interpreted, unexpected aural warn- 
ings, and controls that appear to be ineffective. Operator reaction to these 
stressors is influenced by their experiences and previous encounters with 
similar events. Operator actions can also lead to stress, particularly if severe 
consequences might result. The more adverse the consequences, the greater 
the stress the operator can be expected to experience when encountering that 
event. 





Case Study 


On February 12, 2009, a Bombardier DHC-8-400, on a flight from Newark, 
New Jersey, to Buffalo, New York, crashed into a residence near Buffalo, 
while the airplane was on its final approach to the airport. All 49 passengers 
and crew onboard the airplane and one person on the ground were killed 
in the accident (National Transportation Safety Board, 2010). Night visual 
meteorological conditions prevailed at the time and nothing untoward was 
found wrong with the airplane. Investigators concluded that the captain 
had inappropriately responded to a stick shaker alert, which occurs when 
an airplane is about to stall. Rather than lowering the nose and advanc- 
ing power, as he had been trained to do, he pulled the nose back and the 
airplane entered a stall, from which neither the captain nor the first officer 
were able to recover. Investigators found that neither the captain nor first 
officer had effectively monitored the airspeed before the stick shaker had 
alerted them. 

The error in failing to react appropriately to a stick shaker is one that is 
not wholly consistent with fatigue, given the training that the pilots receive. 
That is, all pilots are trained and are required to demonstrate their recogni- 
tion of, and appropriate response to a stall. The stick shaker alert, in which 
the control column rapidly moves forward and aft and is accompanied by 
a unique sound, is designed to minimize the time pilots need to recognize 
the impending stall. Both auditory and tactile cues that are unique to this 
impending aerodynamic condition are provided and both are readily identi- 
fiable. The criticality of rapid recognition and the need for an effective crew 
response to the warnings of an impending stall led to the requirement for a 
unique and quickly recognized alert. Consequently, little, if any, diagnosis is 


The Operator 101 


needed to recognize the nature of a stick shaker alert. However, allowing the 
airplane to approach a stall by failing to monitor the airspeed is an error con- 
sistent with fatigue because a proper approach to landing calls for pilots to 
rapidly shift their monitoring among parameters of airspeed, descent speed, 
engine power, and lateral and vertical flight paths. However, shifting atten- 
tion, vigilance, and monitoring are cognitive skills that have been demon- 
strated to be adversely affected by fatigue. 

Neither pilot resided in the city from which the flight originated, and both 
had “commuted” or flew as a passenger from their residence to Newark. 
The captain arrived at Newark 3 days before the day of the accident, arriv- 
ing there in the evening, at 20:05, and began a 2-day trip of flights the next 
morning. He spent the night before the 2-day trip in the crew room at the 
airport and awoke before he was required to report for duty at 05:30 the 
next morning and again the following morning, the day of the accident. In 
between, he spent the night at a company-paid hotel. The night before the 
accident, with a 21-hour and 16-minute rest period upon completion of his 
2-day trip, he spent the night in the crew room at the airport. Investigators 
found that 03:10 and again at 07:26 on the morning of the accident he had 
logged onto the airline’s computer. He reported for duty the day of the acci- 
dent at 13:30. 

The first officer flew from her home on the west coast of the United States 
to Newark, changing planes in Memphis. The flight originated in Seattle at 
19:51 local (Pacific) time or 22:51 eastern time and arrived in Memphis at 23:30 
Pacific time or 02:30 eastern time. She then took a flight that left Memphis at 
04:18 eastern time and arrived at Newark about 06:23, eastern time. She then 
rested in the crew room from about 07:32 to about 13:05 when she sent a text 
message from her computer. The flight crews of the flights on which she flew 
to Newark reported that she slept about 90 minutes on the first flight and for 
the duration of the second. 

Airport crew rooms are provided to pilots and flight attendants to 
enable them to relax before their flights. Little privacy is available and, 
while couches may be provided, these are not designed for crew sleep- 
ing because the room lights are typically bright and there is little effort 
to soften the volume of noise. Crewmembers meet each other and typi- 
cally converse before their flights. Therefore, pilots who spend the night 
in airport crew rooms may obtain sufficient sleep to be considered rested, 
but the quality of sleep obtained would negate potential benefits of suf- 
ficient hours of sleep, if it were possible for crewmembers to sleep the 
entirety of their stays in crew rooms. As a result, investigators concluded, 
“the captain had experienced chronic sleep loss, and both he and the first 
officer had experienced interrupted and poor-quality sleep during the 
24 hours before the accident” (National Transportation Safety Board, 2010, 
p. 106). 

Crew rooms do not charge crewmembers fees for their use, unlike hotel 
rooms. Regional air pilots, especially first officers, may not earn enough 
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compensation to be able to afford hotel rooms. Moreover, while regulator 
hours of service dictate the number of hours and rest pilots much obtain 
while on duty, the rules do not apply to hours served off duty, as the captain 
and first officer were on the night before the accident. 

The quality of sleep the pilots obtained in the crew room allowed investi- 
gators to determine that they were fatigued at the time. However, the error 
of not recognizing and responding to the impending aerodynamic stall, as 
noted, was not one consistent with fatigue. In this case, the captain had a 
record of previous errors in training consistent with the one that led to the 
accident. Simply put, his record was such that such an error was consistent 
with the quality of his performance as a pilot when faced with unusual or 
unexpected events. The first officer, by contrast, had no such record of per- 
formance deficiencies. Although investigators determined that the crew was 
fatigued at the time, the lack of correspondence between fatigue and the 
error of not recognizing and responding appropriately to a stick shaker, and 
the presence of an alternative antecedent to error, that is, the captain’s poor 
performance record, prevented them from attributing the critical crew error 
to fatigue. As they concluded, 


Evidence suggests that both pilots were likely experiencing some degree 
of fatigue at the time of the accident. However, the errors and decisions 
made by the pilots cannot be solely attributed to fatigue because of 
other explanations for their performance...The captain’s errors during 
the flight could be consistent with his pattern of performance failures 
during testing, which he had experienced throughout his flying career. 
(National Transportation Safety Board, 2010, p. 107) 





Summary 


Two general categories of operator antecedents, behavioral and physiologi- 
cal, can lead to error. Physiological antecedents can impair performance 
either temporarily or permanently, through disease, medications, or alcohol 
or over-the-counter or illicit drugs. These can temporarily impair operators 
by altering perception, slowing reaction time, and causing fatigue, among 
other adverse effects. 

Behavioral antecedents to error include fatigue and stress, originating 
from the operator’s personal experiences or from company actions. Long 
work schedules, changing shift work schedules, abrupt change in time 
zones, or a combination of these, are company actions that can cause fatigue, 
well-documented antecedents to error. Stress can be caused by factors 
related to the job, or events in an operator’s personal life that are indepen- 
dent of the job. 


The Operator 


DOCUMENTING OPERATOR ANTECEDENTS 


MEDICAL CONDITIONS 


Review operator medical records, both company maintained 
and those maintained by a personal health care provider, with 
a health care professional. 


Note recent diagnoses of and treatment for medical conditions, 
and determine the possible effects of the conditions and associ- 
ated medication on operator performance, in consultation with 
an occupational health expert. 


Interview colleagues, associates, relatives of the operator, and 
the operator if possible, to determine if he or she was experi- 
encing even a mild illness or temporary discomfort at the time 
of the event. 


Interview operator family and colleagues to determine if they 
noted changes in the operator’s daily routines, behaviors, or 
attitudes, and when these changes were first observed. 


Review company personnel records to detect changes in work 
habits, job performance, and attendance. 


DRUGS OR ALCOHOL 


Request, as soon as possible after the occurrence, a blood sam- 
ple from the operator, or from the pathologist if the operator 
was killed, for a toxicological analysis. 


Ask local law enforcement authorities to recommend a reputable 
and qualified laboratory that can conduct toxicological analy- 
ses if local government laboratories are unable to do so. 


Review positive toxicological findings with an occupational 
health specialist, another toxicologist, or health care provider 
with the necessary expertise. Ask him or her to obtain and 
review the results of controlled studies on the medications in 
question. 


Give the toxicologist information about the care of the body 
or the specimen if the operator has been killed, the state of the 
operator’s health before the event, and possible medication that 
the operator may have been taking. 


Consult a physician, pharmacist, toxicologist, or a pharmacolo- 
gist to learn about the effects of single or multiple drugs on 
operator performance when evidence confirms that an opera- 
tor took medications before an occurrence. 
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FATIGUE 


* Document the time at which the operator went to sleep and 
awoke with the previous 3-4 days before an accident. 


* Determine from medical records and prescription drug records 
or a toxicological sample, if possible, if the operator had an 
untreated sleep disorder or other fatiguing medical conditions, 
or had taken sedating medications. 


* Assess fatigue from either an untreated sleep disorder, other 
fatiguing medical conditions, or the use of asedating medication. 


* Characterize those who receive 4 or more hours less sleep than 
typical in a 24-hour as acutely fatigued and those who receive 
2 hours less sleep than usual over four 24-hour periods to be 
chronically fatigued. 


e Reconstruct the times the operator went to sleep and the times 
the person awoke for each of the days since the travel com- 
menced, using the home time zone as the standard for those 
who have traveled across time zones. 


e If the operator had traveled before the accident, note the num- 
ber of days that the traveler was away from the base schedule, 
and the number of days since the traveler returned to the base 
schedule. 


e Document the time of the accident to determine whether it 
occurred between 3:00 a.m. to 5:00 a.m. local time. 


e Identify characteristics of fatigue in the critical errors, includ- 
ing preoccupation with a single task, slowed reaction time, and 
difficulty performing tasks that had been performed effec- 
tively before. 


e Assess fatigue, if not medical or medication-related, from an 
irregular sleep/wake schedule in the days before the accident, 
or insufficient rest in that period. 

e Determine if the error the operator made was a cognitive one, 
and if so, whether it was consistent with a fatigue-related error. 


e Exclude other potential error antecedents such as training short- 
comings, inadequate oversight, or a record of poor performance. 


AUTOPSIES 


e Ifthe operator was killed in the accident, and his or her medical 
condition is unknown, arrange for an autopsy by a forensic 
pathologist if possible, or one with additional training and 
experience in accident investigations. 
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* Give the pathologist information on the nature of the accident, 
the state of the body, the role of the operator at the time, the 
nature of the machinery with which the operator was interact- 
ing, and data obtained from medical records, peer interviews, 
and other sources. 


e Provide the pathologist with photos of the operator’s body and 
of the accident site. 


e Ask the pathologist for information on preexisting physiologi- 
cal conditions, the effects of impact forces, (if in a vehicle or 
other dynamic environment) thermal injuries, or toxic fumes, 
and the presence of corrective lenses, hearing aids, or other 
supplemental devices on the body of the operator. 


STRESS 


e Interview family and colleagues to determine whether the 
operator experienced stressors before the event, and how he or 
she reacted to them. 
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The Company 











The Japanese nuclear accident three weeks ago occurred largely because 
managers counted on workers to follow rules but never explained why 
the rules were important. 


Wald, 1999 
New York Times 





Introduction 


A company's role in creating antecedents to error has been increasingly 
recognized. The New York Times account of the 1999 accident at a Japanese 
uranium processing plant suggests that the operator errors that led to the 
accident were a direct result of management actions and decisions. A subse- 
quent New York Times article revealed that plant managers compounded the 
effects of their initial actions by not developing an emergency plan in the 
event of an incident (French, 1999). Apparently, the managers believed that 
an accident could not occur and therefore, none was needed. This chapter 
will address how companies can create antecedents to error in the systems 
that they oversee and operate. 





Organizations 


Companies operating complex systems can influence the safety of those sys- 
tems in many ways. They hire and train the operators, establish operating 
rules and maintenance schedules and practices, and they oversee compli- 
ance with these rules, schedules, and procedures, among other activities. 
Shortcomings in any of these areas have the potential to create error ante- 
cedents, and companies are ultimately responsible for ensuring that their 
actions minimize the role of potential antecedents, so that the systems they 
operate do so safely. 
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Hiring 

Companies hire the individuals who operate their systems. In doing so, the 
standards they use can have considerable influence on the safety of system 
operations. Selecting operators who may not be effective will not only be 
expensive in training costs, but it can enhance the likelihood of subsequent 
errors occurring as well. 

Companies generally consider several factors when selecting candidates 
for operator positions. These include requisite skills and knowledge, pre- 
dicted performance as operators, and the number of operators needed to 
oversee system operations. 


Skills, Knowledge, and Predicted Performance 


Companies are expected to identify the knowledge and skills their person- 
nel need to effectively operate systems. They apply these to their hiring stan- 
dards to enable them to identify applicants likely to perform the required 
tasks safely and those applicants predicted to commit a disproportionate 
number of errors. 

For example, maintenance technicians would be expected to perform one 
or more of the following tasks, 


e Read and understand maintenance manuals 


* Understand the structural and mechanical relationships among 
components and subsystems 


e Apply instructions to tasks 

* Identify and locate appropriate components and tools 

e Diagnose and correct mechanical malfunctions 

* Recognize when the tasks have been completed 

* Complete written documentation 

* Describe, either orally or in writing, the maintenance actions taken 


* Verify that the intent of the maintenance instructions had been car- 
ried out 


Deficiencies in performing any of these tasks could lead to errors. The 
more critical skills an applicant can perform, the more effective the appli- 
cant will likely be as an operator, and the fewer the errors he or she will be 
expected to commit. It is a company's responsibility to ensure that those it 
hires as maintenance technicians, for example, can do so competently, either 
upon hiring or after completing a training program. Selecting a person for a 
position of responsibility without the requisite training to enable that person 
to perform acceptably creates an antecedent to error. 

Companies have also created error antecedents by selecting operators 
based on skills or experience that may not necessarily relate to those actually 
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needed for system operations. For example, because interpersonal skills are 
critical to the effectiveness of operator teams, companies need to consider 
these skills among their selection criteria, skills that may not always be effec- 
tively assessed through company selection criteria. Members of operator 
teams possessing deficient interpersonal skills may commit errors, or con- 
tribute to the errors of others. 

In industries in which the regulators establish operator-licensing criteria, 
companies apply them as minimum selection standards, limiting but not 
eliminating their ability to implement their own selection standards. In the 
event that investigators identify deficiencies in the skills of licensed opera- 
tors, they need to focus on both the regulator's licensing standards and the 
company’s hiring criteria to identify antecedents to error. 


The Number of Operators 


The number of operators interacting with a system, whether too many, too 
few, or optimum, affects the quality of individual and team performance 
and system safety. Paris, Salas, and Cannon-Bowers (1999) believe that an 
insufficient number of operators can create excessive operator stress because 
of the resultant increased individual workload. However, hiring too many 
operators is wasteful and can create low individual workload, which can 
lead to operator boredom and inattention, both potential antecedents to error 
(e.g., O'Hanlon, 1981). 

Designers, regulators, companies, or operators themselves may establish 
the minimum number of persons needed to operate systems. In aviation, reg- 
ulators require at least two pilots to operate air transport aircraft, even if only 
one is needed to effectively control the airplane. Companies in other systems 
may have more discretion in determining the number of operators they need. 
They may base the decision on the nature of the tasks, the degree of difficulty 
in performing the tasks, the amount of time available to complete them, or 
even on existing agreements with labor organizations. For example, when 
scheduling maintenance tasks, companies may want to return the equipment 
to service quickly and may assign more operators than usual to the task. 

In some systems, operator activity during different system operating 
phases varies and as a result, additional operators may be needed more in 
certain operating phases than at other times. Some companies determine 
the number of operators they need based on an “average” workload level. 
In other systems, where individual workload varies according to the system 
operating phase, companies may match the number of operators to the num- 
ber needed by the operating phases. During periods of high workload, they 
may add operators to match the workload and likewise, they may reduce the 
number needed during low workload periods. For example, air traffic con- 
trol sectors or airspace segments are often combined at night when air traffic 
activity is typically light, thereby reducing the number of controllers needed 
during those low activity periods. 
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Researchers have studied methods of determining the number of opera- 
tors needed to operate safely. Lee, Forsythe, and Rothblum (2000) developed 
a mechanism to determine the appropriate number of operators needed in 
one complex system, commercial shipping. They incorporated factors such 
as phase of voyage (open waters, restricted waters, and in port), port call fre- 
quency, level of shore-based maintenance support, and applicable work/rest 
standards into their analysis. The number of crewmembers needed varied 
considerably with changes in these factors, factors that would have different 
weights according to the particular system and its operating environment. 


Training 


In general, operator performance deficiencies are more likely to result from 
deficiencies in company training than from deficiencies in hiring. Training 
programs significantly affect the ability of companies to reduce error oppor- 
tunities and decisions made on the type and length of training can influence 
the potential for errors to occur in system operations. 

Training in complex system operations generally involves two compo- 
nents. One, initial training, designed to convey the overall knowledge and 
skills necessary to effectively operate the systems, is administered to newly 
hired operators and two, ongoing training (or recurrent training in commer- 
cial aviation), is designed to maintain the skills and knowledge of existing 
operators and introduce them to changes in the system or to other safety- 
related topics. Most companies that operate complex systems employ some 
type of initial training to introduce newly hired operators to system opera- 
tions, however, not all conduct ongoing training. 


Training Content 


After companies have hired candidates to serve as system operators, they 
need to provide them with knowledge of the system and its operating proce- 
dures, and enable them to acquire the skills necessary to operate the system. 
Some companies employ on-the-job training to accomplish these objectives. 
In such training, new employees first observe experienced operators and 
then, over time, learn to operate the system under their supervision. Other 
companies use formal training curricula to train new operators. 

Initial training should describe the system, its components and subsys- 
tems, their functions, normal and non-normal system states, and general 
company policies and operating procedures. Initial training can also intro- 
duce employees to potential system shortcomings. New systems, no matter 
how thoroughly tested before their introduction to the operating environ- 
ment, may have difficulties or “bugs” that designers had not anticipated 
and therefore, not addressed. Although training should not be expected to 
compensate for design deficiencies, the training environment may be used 
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to introduce operators to potentially unsafe elements of system operation so 
that they will be familiar with them and, if necessary, respond appropriately 
should they encounter these elements. 

Ongoing training is designed to ensure that system operators learn about 
design changes, new and/or modified operating procedures and regulations, 
and other pertinent system changes to enable them to continue to perform 
effectively. Some industries require ongoing training at regular intervals, 
others schedule it only as needed, and some do not conduct additional train- 
ing at all. Similar differences can be found among the standards accrediting 
organizations apply to the certification of professionals in their particular 
fields (Menges, 1975). Some establish criteria for ongoing training, including 
the curricula, instructional media, and intervals between training sessions 
while others establish a minimum number of continuing education credits 
or courses to be completed within a certain period. 


Instructional Media 


Technology has enabled training systems to inform and educate operators in 
new and innovative ways. For example, simulators can accurately replicate 
system operating conditions with almost the full range of system character- 
istics during both expected and unexpected conditions. These allow opera- 
tors to practice responding in a safe environment free of severe personal 
consequences to scenarios that would otherwise be too dangerous to practice 
in actual systems. Despite acquisition costs that can exceed several million 
dollars (e.g, Moroney and Moroney, 1998) simulators and system training 
devices have considerably improved operators’ ability to respond effectively 
to nonroutine operating conditions. 

Training programs, whether initial or ongoing, may employ various 
instructional media, including computer-based instruction, CD-ROM, and 
Internet-based presentations, as well as instructor presentations, and text 
and written material. Each has particular advantages and disadvantages for 
students, instructors, and course training coordinators. Some allow more 
flexibility in a student’s pace of learning and some may offer reduced devel- 
opment and delivery costs. Some programs combine instructional media, 
with available instructors to answer questions on specific topics at the stu- 
dents’ own pace, without disrupting the class or distracting other class 
members. 

However, it must be remembered that regardless of the particular medium, 
instructional media and system training can only deliver instructional mate- 
rial; they cannot compensate for deficiencies in the material they present. 
Although the type of instructional medium can influence the quality or 
pace of learning, the quality of the training program is largely dependent 
upon the content of the material presented and not the instructional medium 
delivering it. 
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Costs versus Content 


Companies strive to maintain effective training programs within budget- 
ary limitations, yet compromises to balance the competing objectives of high 
quality with low cost are a fact of corporate life. Companies exercise consid- 
erable control over the content of their training programs, within the given 
regulatory standards, and try to keep costs down, recognizing that effective 
training in complex systems can be so expensive that they may have to make 
compromises in other areas as a consequence. Reason (1997) suggests that 
the need to maintain the operations that produce the resources necessary to 
fund training influences companies to weigh production needs more than 
nonproduction needs, such as training. 

Because of the often substantial costs of operator training, managers may 
devote considerable effort to operator selection to ensure that those hired 
will successfully complete the training. Some companies even require opera- 
tors to be fully trained before being considered for selection. High training 
costs may also produce an unintended result—inducing companies to retain 
operators whose skills may have deteriorated to avoid the expense of train- 
ing new operators. Such decisions could create company antecedents to error. 

The caliber of a company’s training serves as a measure of its commitment 
to reduce opportunities for error. Those that provide training beyond the 
minimum that the regulator requires, and that spend additional resources 
to ensure that their operators are skilled and proficient, can be said to have 
undertaken positive efforts to reduce opportunities for error. By contrast, 
companies with training programs that meet only minimum standards may 
create opportunities for error. Issues such as these, reflecting on corporate 
culture, will be discussed more later in this chapter. 


Procedures 


Complex systems require extensive rules and procedures to guide opera- 
tors on how to interact with the equipment, and to serve as the final author- 
ity on how operations are to be conducted. Procedures also guide operators 
in responding to new or unfamiliar situations, and can help to standard- 
ize operations across companies and even across international borders. 
The International Civil Aviation Organization, for example, has established 
rules governing both air traffic control procedures and aircraft operations 
across international borders, while its marine counterpart, the International 
Maritime Organization (IMO), has developed rules and procedures for use 
in international maritime operations. These two systems, marine and avi- 
ation, which involve international operations as a matter of routine, have 
developed and implemented standard procedures to make compliance with 
rules and procedures across international borders relatively simple. 

To be effective, operators must perceive procedures to be logical and neces- 
sary to ensure safe and efficient operations. Otherwise, they may disregard 
them over time, as reported at the beginning of this chapter—unless their 
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fear of adverse managerial action is sufficient to ensure their compliance. 
Designers often develop general operating; procedures for the systems they 
design, but companies are ultimately responsible for the procedures they 
implement, which they can tailor to their own operational needs and require- 
ments. Companies may modify procedures after they have begun operat- 
ing the equipment for such reasons as standardizing procedures across the 
different equipment that they operate, or improving operational efficiency 
as they gain familiarity and experience with the equipment. Such modifica- 
tions, in response to lessons learned during system operations, reflect well 
on a company's oversight and its efforts to enhance operational safety. 


General versus Specific 


Companies face a fundamental dilemma in developing and implementing 
procedures. Procedures should be sufficiently specific and unambiguous 
to guide operators in responding to most situations, yet not so specific that 
operators may feel unable to respond to unexpected situations for which 
applicable procedures have not yet been developed. As Flach and Rasmussen 
(2000) note, “it is impossible to have conventions for unconventional events” 
(p. 170). 

System safety depends on operators following; procedures, yet the oper- 
ators must still possess the authority to bypass procedures if, given their 
experience and expertise, they believe that the circumstances so warrant. 
Explicit procedures provide the guidance operators need to operate systems 
as intended and ensure that different operators control the system similarly. 
However, overly restrictive procedures can work against safety. Reason 
(1997) argues that these may actually encourage operators to develop their 
own shortcuts, circumventing the intent of the procedures. Overly restrictive 
or comprehensive procedures also need extensive management oversight 
to ensure that operators comply with them. Most companies recognize that 
it is impossible to develop procedures for responses to all possible circum- 
stances. Ideally, procedures companies develop and implement will be both 
comprehensive and specific, applying to as many potential circumstances as 
possible. 





Oversight 


Oversight is a critical element of a company's responsibility for the safety 
of the system it operates. It is intended to both ensure that operators adhere 
to the operating procedures, and to inform companies of critical aspects 
of system operations in need of modification. Oversight serves both to 
inform operators how to operate the system, and to inform companies how 


116 Investigating Human Error 


effectively operators are carrying out their procedures, as well as of proce- 
dural changes that may be needed to ensure continued operational safety. 
Effective oversight requires obtaining sufficient operator performance data, 
through frequent and thorough acts of data gathering and inspection, to rec- 
ognize how well the system is being operated, and to identify changes that 
may be needed to enhance operational safety. 

Oversight data should describe employee performance quality in a variety 
of operating conditions. Many large companies, with thousands of opera- 
tors, have too large a span of supervision to allow effective oversight of all 
operators, and they depend on operator performance data for effective over- 
sight. Effective oversight informs companies about all critical aspects of their 
operations. Well-informed companies can quickly respond to operational 
difficulties as they emerge, and thus help to reduce the likelihood that these 
difficulties can become opportunities for error. 

The quality of oversight varies according to the system, the operating 
conditions, the operators, and the severity of the consequences of opera- 
tor error. Since continuous oversight of operators is unfeasible, companies 
must maximize the quality of their oversight, through as many of the oper- 
ating cycles as reasonable, to provide them with sufficient information for 
a realistic portrayal of the quality of operator performance and procedural 
quality. Done properly, operational oversight can be effectively carried out 
by monitoring, recording, and sampling data that is representative of per- 
formance in the various system operating phases. For example, to deter- 
mine the extent of taxpayer compliance with the tax code in the United 
States, the federal government inspects or audits the tax returns of fewer 
than 5% of taxpayers in a given year. Yet, this sampling of taxpayers reveals 
that the overwhelming majority of taxpayers comply with the tax laws, 
despite the low probability that an individual’s returns will be audited. 
Most taxpayers know that they will receive stiff penalties if convicted of 
evading tax laws and they are unwilling to face the penalties, irrespective 
of the low risk of being caught. By sampling a few tax returns the federal 
government can maintain realistic oversight of taxpayer compliance with 
tax laws. 


New Operators 


Companies need to respond to operator performance deficiencies before 
their performance can jeopardize system safety. Consequently, operators 
with manifestly deficient performance are rarely encountered in complex 
systems; companies tend to address the issue before safety is jeopardized. 
Organizations that fail to deal with deficient performance can create ante- 
cedents to error and investigators may occasionally encounter operators with 
histories of performance deficiencies. 

Many companies place new operators in probationary periods that 
give them full discretion to evaluate operator performance, and retain or 
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discharge operators in these periods. Effective oversight requires companies 
to identify and address performance shortcomings that their operators may 
demonstrate at any time during their employment, but companies should 
pay especial attention to employees who may have difficulty mastering skills 
during their probationary period because of the relative ease with which 
companies can deal with probationary employees compared to employees 
retained beyond their probationary periods. 


Experienced Operators 


Some experienced operators can present different types of safety chal- 
lenges to companies. They may perform satisfactorily during routine or 
expected situations but perform unacceptably when encountering nonrou- 
tine or emergency situations. Effective oversight should enable companies 
to identify those operators whose ability to respond to nonroutine situations 
is uncertain. Investigators, to the extent possible, should obtain company 
records of operator performance during both routine and nonroutine oper- 
ating periods, as well as records of company responses to operator perfor- 
mance deficiencies. 

Some operators may also knowingly violate company procedures when 
they are confident that company managers will not detect their actions, 
thereby endangering system safety. Here too companies must identify and 
respond to such safety hazards. Because of the need to ensure that opera- 
tors are following necessary procedures, companies that take little or no 
action in response to unjustified violations of procedures create antecedents 
to error by effectively communicating to operators that procedural noncom- 
pliance will not be addressed. Ultimately, companies may have to remove 
from safety-sensitive positions operators who have disregarded operating 
procedures. 

Investigators observed the outcome of such an operator in their investiga- 
tion of a 1993 aircraft accident, involving a pilot whom peers had reported 
as ignoring critical procedures (National Transportation Safety Board, 1994). 
The airplane, owned and operated by the Federal Aviation Administration, 
struck the side of a mountain, killing him and two of his peers onboard, 
before air traffic controllers could clear the plane to climb through clouds 
to a higher altitude and safely depart the area. The pilot chose not to delay 
takeoff to wait on the ground for air traffic controllers’ authorization to pro- 
ceed to his destination, most likely in the belief that he could obtain it more 
readily once airborne. 

After the accident, pilots who had flown with the captain described to 
investigators multiple instances of his unsafe practices, reports that corre- 
sponded to the nature of his performance on the accident flight in his will- 
ingness to violate rules and procedures. According to the investigators, his 
fellow operators reported that the pilot had (National Transportation Safety 
Board, 1994), 
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Continued on a VFR [visual flight rules] positioning flight into IMC 
[instrument meteorological conditions], 

Conducted VFR flight below clouds at less than 1,000 feet above 
the ground in marginal weather conditions [violating safe operating 
practices], 

Replied to an ATC [air traffic control] query that the flight was in VMC 
[visual meteorological conditions] when it was in IMC, 

Conducted departures without other flightcrew knowing essential 
flight planning information, such as IFR [instrument flight rules]/VFR/ 
en route filing/weather briefing/ultimate destination or routing, 

Departed on positioning flights without informing other crewmem- 
bers whether he had obtained weather information or filed an appropri- 
ate flight plan, 

Disregarded checklist discipline on numerous occasions, 

Refused to accept responsibility that his failure to adhere to a checklist 
had caused an engine damage incident in January 1993, [an event that 
precipitated a letter of reprimand from his supervisors], 

Performed a “below glide path check” in IMC when VMC conditions 
were required by FIAO [the FAA organization operating the flight] 
requirements, and refused to answer a SIC [co-pilot] query regarding 
the reason for his alleged violation of VFR requirements in an incident 
2 weeks before the accident. (p. 8) 


Moreover, investigators were informed of reports that other pilots had 
made to the captain’s supervisors regarding the captain’s procedural viola- 
tions. Yet, despite repeated complaints, the supervisors failed to address his 
performance. Their failure allowed him to continue violating procedures as 
he did on the accident flight—continued visual flight operation into instru- 
ment conditions—without air traffic control authorization. Consequently, 
investigators determined that the supervisors’ role in this accident was 
equivalent to that of the captain’s. While their action did not directly lead 
the accident, their inaction in the face of considerable information about his 
unsafe practices allowed the pilot to knowingly violate a procedure that 
caused the accident. As Reason (1997) notes, opportunities for “rogue” opera- 
tors to ignore procedures increase in organizations that have deficient over- 
sight, or have managers who are unwilling or unable to enforce compliance 
with the organization's operating rules and procedures. 


"Good" Procedures 


By establishing the circumstances under which operators interact with the 
systems, and by setting the tone for their "corporate culture,” companies and 
organizations can positively affect operator performance. They can encour- 
age operators to keep management informed of perceived safety hazards, 
including instances of operator noncompliance with operating procedures. 
Reason (1997) recommended several techniques that companies can under- 
take to enhance the safety of system operations, including establishing a 
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“reporting, culture,” in which companies encourage employees to report 
safety issues. Such reports, which can include an operator's own errors, when 
conveyed in a fair and nonpunitive atmosphere, can encourage operators and 
their supervisors to recognize and address system antecedents to error, thus 
serving as a critical component of a corporate culture that enhances a com- 
pany's operational safety. Companies can also develop and implement sal- 
ary and incentive programs that reward suggestions for improving safety. 
Where technical capabilities are in place, companies can read data from sys- 
tem recorders to monitor system state and operator performance. All have 
increased supervisor knowledge of potential safety issues in their operations. 





Formal Oversight Systems 


Since Reason (1990,1997) offered suggestions to enhance system safety, 
researchers have examined and regulators and companies have developed and 
implemented specific techniques to this end. For example, Helmreich and his 
colleagues studied techniques to expand crew resource management (CRM) 
practices to include error management (e.g., Klinect, Wilhelm, and Helmreich, 
1999) and Guldenmund's (2010) and Grote's (2012) studies of corporate safety 
culture suggest techniques that companies can use to enhance safety. Today, 
international and domestic regulators have endorsed concepts to proactively 
enhance safety and have published manuals to assist companies to develop 
and implement them (International Civil Aviation Organization, 2002, 2013). 


Line Operations Safety Audit 


Line Operations Safety Audit or LOSA, was derived from Helmreich and his 
colleague's research into CRM (e.g., Klinect, Wilhelm, and Helmreich, 1999). 
In LOSA, expert observers observe real world system operations (“line oper- 
ations”), and record crew actions and statements, both technical and team 
oriented, according to objective, predetermined assessment criteria. After 
the observation, the LOSA observers review the results with the operators 
and provide feedback on the technical and team-oriented quality of their 
performance, with suggestions for improvement, when warranted. LOSA 
is not designed to be used negatively, such as for criticizing or adversely 
rating performance, but rather constructively, to improve performance and 
enhance operational safety. 


Flight Operations Quality Assurance 


Flight Operations Quality Assurance (FOQA), as LOSA, was developed ini- 
tially in aviation but has since been implemented in other complex systems 
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as well. FOQA uses software to analyze system recorder data, such as data 
on flight data recorders, not for accident investigation purposes for which 
they were developed, but to monitor operator performance of the systems 
in question (Federal Aviation Administration, 2004). Flight data recorders, 
which had recorded five parameters in the analog era (heading, altitude, 
airspeed, vertical acceleration, and microphone keying), now record hun- 
dreds of digital aircraft system and flight parameters that give precise indi- 
cations of the airplane, it’s state in the minutes before the accident, and pilot 
interactions with aircraft controls and major systems. Reading out flight 
data recorders proactively enables airlines, as with LOSA, to monitor opera- 
tor performance in certain maneuvers at particular airports, for example, 
and to determine whether additional training and/or procedural modifica- 
tions are needed. FOQA provides airline information, in the absence of an 
accident or incident, about operator and aircraft performance in real time, 
with multiple pilot crews and aircraft, allowing them to learn of potential 
safety issues in the absence of an accident or incident, or in the absence 
of operator or management recognition of potential safety issues. Since 
airlines have begun implementing FOQA, other industries with onboard 
system recorders, such as companies operating oceangoing vessels that are 
required to be equipped with voyage data recorders, have begun similar 
initiatives as well. 


Safety Management Systems 


Safety Management Systems (SMS) are structured programs that enable 
companies to identify and mitigate risks to enhance the safety of their sys- 
tems. Unlike the previous two programs, SMS was developed initially in 
the marine system, when, in 1993, the IMO made the implementation of 
these programs mandatory, in response to the March 6, 1987, Herald of Free 
Enterprise ferry accident off the coast of Belgium, in which 188 passengers 
and crew were killed (Department of Transport, 1987). An SMS program is 
a systematic method of recognizing and mitigating risks in company opera- 
tions. Since its adaption in the marine industry other systems, such as avia- 
tion, have encouraged companies to implement SMS programs as well. The 
Federal Aviation Administration (FAA, 2015) describes four elements of SMS 
programs: management policies, procedures, and organizational structures 
that accomplish the desired safety goals; a formal system of hazard identi- 
fication and safety risk management; controls to mitigate the risks (safety 
assurance); and a method of promoting safety as a core corporate value. 
These are among the methods that companies across different complex 
systems can use to proactively enhance safety. Companies and systems have 
employed programs that may be unique to the particular systems, or to the 
companies. It is important to note, however, that such programs are meant to 
be proactive and that investigators should not consider their absence, unless 
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mandated by the regulator, to be indicative of an organizational antecedent 
to error. 





Company Errors 


Researchers have examined differences between errors that can be attributed 
to an individual operator and those attributed to a company and its opera- 
tions. For example, Goodman, Ramanujam, Carroll, Edmondson, Hofmann, 
and Sutcliffe (2011) describe conditions that must be met for errors to be con- 
sidered organizational: 


First unintended deviations from organizational expectations...[regard- 
ing] work activities; second, the...actions of multiple individuals who 
are acting in their formal organizational roles and working toward 
organizational goals; third, [both of which] ...can potentially result in 
adverse organizational outcomes; and, finally, [both of which] are pri- 
marily caused by organizational conditions. (p. 154) 


That is, for an error to be organizational and not individual, more than 
one individual had to have been involved in acting or deciding, in a manner 
considered to be furthering an organization’s goals, in ways that would lead 
to adverse consequences. 

Strauch (2015), expanding on this concept, adds that to identify company 
antecedents, one of three conditions have to be met. Investigators must be 
able to demonstrate that company officials (1) acted or made decisions in the 
face of information alerting them to the need for different actions or deci- 
sions, (2) acted or decided in the face of self-evident information of the need 
for corrective action, or (3) took no action or made no decision when an action 
and/or decision was warranted. 

Information on the need for corrective action can include a history of simi- 
lar accidents or incidents in a relatively brief period, operator or manager 
reports of safety deficiencies, FOQA, LOSA, and SMS data, regulator-cited 
infractions, patterns of failures on operator examinations of proficiency, and 
patterns of maintenance deficiencies. In the face of such evidence, company 
action to address the information provided is warranted and inaction should 
be considered a company antecedent to error. Illustrations of self-evident 
data include work schedules that are fatigue-inducing, punitive oversight 
programs, and publicly berating operators who commit errors. 

Unfortunately, illustrations of organizational accidents are not common 
in the investigation literature; identifying them as such is a relatively recent 
phenomenon. Whether it is not acting on indications of safety shortcomings, 
deciding not to improve training when data calling for such improvement 
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is manifest, or tolerating bullying management, investigators have come to 
recognize and describe companies’ roles in accidents. 





Case Study 


On July 25, 2010, a 30-inch-diameter segment of a petroleum pipeline rup- 
tured near Marshall, Michigan (National Transportation Safety Board, 
2012). The rupture was caused by a defect in a weld in the pipeline. 
Operators in the company’s Edmonton, Alberta, Canada, pipeline control 
center failed to detect the rupture for over 17 hours, in spite of multiple 
indications of a pipeline fault near Marshall, Michigan. Rather, they inter- 
preted the aural alerts and visual display information not as a rupture but 
as a Separation in a column of oil in the pipeline, a result, they believed, of 
the effects of hilly terrain on the column of petroleum within the pipeline. 
While such terrain is known to cause column separation, because gravita- 
tional forces differentially effect oil flow within a pipeline with changes 
in terrain, the location of the leak was in a relatively flat area and the con- 
trol center operators, all from Alberta, were unfamiliar with the terrain in 
Michigan. As a result of their misinterpretation of the alerts and displays, 
the operators continued pumping oil through the ruptured pipeline on two 
separate occasions, for over 1% hours, until they were informed of the leak 
by personnel in Michigan, 17 hours after the initial rupture. Only upon 
notification from those near the site of the rupture were they able to cor- 
rectly recognize that a pipeline rupture had occurred. Over 800,000 gallons 
of crude oil were released into the adjacent wetlands and a nearby creek 
and river. The cost of the cleanup, which continued over a year after the 
accident, exceeded $1 billion USD. 

Investigators found several company antecedents that led to the control- 
ler and supervisor failures to detect the rupture. Procedures developed 
specifically to insure that ruptures would be detected were violated. For 
example, controllers were required to stop oil flow after 10 minutes if 
alarms continued. However, in the belief that additional pressure from the 
pumps would join the separated oil column, pumping was allowed to con- 
tinue well beyond that limit. In addition, the performance of the operator 
teams—and of the teams of operators and their supervisors—broke down, 
resulting in ambiguous supervisory chains, in which supervisors deferred 
decisions to subordinates who lacked the necessary expertise, limiting the 
team’s ability to effectively analyze the rupture-related alarms and dis- 
plays. Further, investigators found that training exercises that the company 
had conducted for operators were invariant, and over time, the exercises 
failed to present the controllers with realistic scenarios that could have 
prepared them to effectively diagnose and respond to system anomalies. 
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“They have some preconfigured programs,” one controller told investiga- 
tors, “that we run and some of them have station lockouts and some of 
them have leaks and some of them have just com [communications devices] 
fails and different scenarios that we go through to help us to understand 
what we're seeing” (National Transportation Safety Board, 2012, p. 48). In 
addition, control center supervisors were not required to take the recurrent 
training, that the operators had been required to take, yet they nonethe- 
less played key roles in analyzing, incorrectly, the post-rupture alarms and 
displays. These safety lapses, as well as similar findings that investigators 
obtained in previous investigations of company incidents, had been known 
to control center supervisors and company managers, but they were not 
addressed. In fact, investigators found that supervisors had used the les- 
sons of earlier incidents to justify bypassing and violating company pro- 
cedures in the Marshall, Michigan, accident. Company supervisors and 
operators had information regarding the need for alternative courses of 
action, yet they did not act on them, thereby creating antecedents to the 
errors of the operators and their supervisors. This accident illustrates the 
types of antecedents that a company can commit that led to operator (and 
supervisor) errors, which exacerbated the effects of the rupture, creating a 
serious environmental accident. 





Summary 


Accident investigators as well as students of human error have come to 
recognize the role of companies that operate complex systems in creating 
antecedents to errors in those systems. The selection processes used to hire 
system operators can identify recognizable or predictable operator skills and 
deficiencies, and thus influence safety by the quality of operators they hire. 
Companies determine the optimum number of operators needed to run their 
systems during both routine and non-routine system states, they train opera- 
tors to safely operate the systems, and they conduct recurrent training, to 
enable operators to remain current with changes in system design or operat- 
ing procedures. Companies also establish operating procedures and, within 
reason, enforce adherence to those procedures. Procedures should guide 
operators to interact with systems throughout the range of expected system 
states, yet provide sufficient flexibility to operators in the event that the pro- 
cedures do not apply to an unexpected event. Several systems have devel- 
oped and implemented programs to proactively enhance safety. Companies 
that have information of the need to address safety deficiencies but dont act 
on them, or where safety lapses in system performance are such that compa- 
nies should but dont act on them regardless, are considered responsible for 
antecedents to errors in their systems. 
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DOCUMENTING COMPANY ANTECEDENTS 


* Refer to company manuals and related written documenta- 
tion, interview managerial and operations personnel, human 
resource specialists, experienced operators, and both newly 
hired operators and if possible, applicants who were not hired, 
to obtain their accounts of the selection process, training, pro- 
cedures, and oversight. 


SELECTION 


* Determine the extent to which both the company’s selection 
criteria and selection process changed over a period of several 
years. 


* Match company-employed selection criteria to the skills that 
operators are expected to perform routinely. Note inconsisten- 
cies between the two, and determine the extent to which the 
process can adequately predict effective operator performance 
among applicants. 


e Assess the extent to which the company recognizes operator 
deficiencies during their probationary period and after they 
have fully qualified as operators. 


* Determine the number and proportion of operators in each 
of several years from the time of the accident, who were not 
retained beyond the conclusion of their probationary periods, 
or whose employment was terminated thereafter, because of 
performance deficiencies. 


e Describe performance deficiencies that led to the company 
actions. 


TRAINING 


* Compare the content of company training to the knowledge 
and skills operators need to effectively and safely control 
systems. 

* Document the training material presented, the methods of 
instruction, and instructional media, such as control station 
simulators, that the organization uses in training. 

* Determine the extent to which the content of company training 
pertained to the event under investigation. 

* Assess the extent to which company training surpasses, meets, 
or falls below the minimum level of instruction mandated. 
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PROCEDURES AND OVERSIGHT 


e Determine the extent to which the operating procedures that 
the company established prepared operators to respond to the 
event being investigated. 


* Document the type and frequency of company oversight over 
a period of time up to the accident. 


* Determine the extent to which oversight informed the com- 
pany of operator application of company procedures. 


HISTORY 

* Document previous company accidents, incidents, and regula- 
tory violations. 

* Document operator reports to the company of safety concerns. 

* Document company responses to accidents, incidents, regula- 
tory violations, and/or operator safety concerns. 

* Determine the extent to which errors or safety deficiencies 
resemble previous accidents, incidents, violations, and/or oper- 
ator safety concerns, and the company's response to them. 
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President Park Geun-hye of South Korea vowed on Monday to disband 
her country’s Coast Guard, saying that South Korea owed “reform and a 
great transformation” to hundreds of high school students who died in 
a ferry disaster last month. 


Choe, 2014 
New York Times 





Introduction 


In one form or another, regulators are essential components of complex 
systems. They provide a degree of assurance to the public that the complex 
systems it depends upon are safe, and they regulate the systems to ensure 
that minimum standards of safe operation are maintained. Regardless of 
the complexity of a particular system or a system with operational safety 
measures with which people may be unfamiliar, a government agency is 
typically responsible for overseeing the safety of the system. Whether one 
boards an airplane or a ferry, or turns on an electrical appliance that receives 
its electrical power from nuclear energy, an agency of some kind regulates 
the particular industry. Throughout the world, some level of independent 
supervision and inspection of complex systems has become expected. In the 
United States, with some exceptions, federal agencies carry out the oversight. 
In Australia, individual states oversee the railroads. 

The level of regulator oversight over a company’s operations varies among 
countries, industries, and regulators. Some oversee relatively minute aspects 
of company operations while others play a less active role in the systems they 
oversee. In general, the more consequential the potential adverse effects of a 
system accident, the more likely that a regulator will be involved in oversee- 
ing the system. 

The importance of the regulator to system safety can be seen in the marine 
environment. Ferry accidents, in which hundreds of people have been killed, 
have occurred in countries with large inter-island ferry transportation sys- 
tems, but with relatively weak regulator oversight. For example, the April 14, 
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2014, the ferry accident in South Korea involving the ferry Sewol referred to 
at the beginning of this chapter, killed over 300 passengers. After the acci- 
dent, allegations arose over lax oversight of the ferry. The agency responsible 
for overseeing the safety of the vessel was reported to have failed to recog- 
nize how changes to the vessel's structural design could affect its stability, 
which, if true, is a critical regulatory oversight error. 

Depending on the industry, regulators establish standards for operator 
licensing and training, maintenance and inspection, operating procedures, 
operator medical standards, and in some industries even organizational 
structure, activities that parallel many of those of organizations and com- 
panies. A weak or ineffective regulator communicates to the industry it 
oversees, and to the operators of that industry, that deficient performance, 
whether deliberate or inadvertent, will be overlooked. 

Given the similarity of the regulator oversight tasks to those of companies, 
investigators can assess the adequacy and effectiveness of regulator perfor- 
mance, and of regulator antecedents to error, in ways that are similar to those 
conducted for companies, by examining the role of many of the antecedents 
previously outlined. For example, the effectiveness of rules governing equip- 
ment design features can be gauged by the standards of data conspicuity, 
interpretability, and other characteristics discussed in the preceding chapter. 





What Regulators Do 


Regulators primarily perform two functions that are critical to system safety. 
They establish rules ensuring the safety of system operation and equip- 
ment design, and they enforce compliance with those rules. The rules are 
designed to provide a minimum level of safety to the systems they over- 
see. Organizations that operate at that level would be expected to operate 
safely. Of course, exceeding those levels of safety would be encouraged, but 
not required, by the regulator. Regulator antecedents are either a function 
of inadequate rules, poor oversight, or both, and investigators must exam- 
ine whether regulator oversight, in either capacity, led to errors that caused 
accidents. 

Regulators are expected to establish sufficiently rigorous rules, methods, 
and standards to provide a minimally acceptable standard of public safety. 
Yet, at the same time, overly restrictive regulations may inhibit operations 
and thus a company’s ability to operate profitably. As a result, regulators face 
pressures unlike those of other system elements. They are asked to oversee 
the safety of systems with potentially catastrophic consequences to system 
malfunction, without restricting the freedom of companies to operate their 
systems profitably. 
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Reason (1997) refers to “the regulators’ unhappy lot,” in which they are 
caught between demands for absolute system safety, resistance to what may 
be considered excessive oversight, and a public that is often unwilling to 
provide regulators the resources necessary to enable them to carry out their 
missions effectively. As a result, regulators, whose work is effective with con- 
tinued system safety, rarely attract public attention. Reason (1997) points out 
that the public may consequently become unwilling to provide the resources 
regulators need to enable them to conduct the oversight needed to effec- 
tively monitor the safety of the systems that they are tasked with oversee- 
ing. Since the benefits of effective regulatory oversight are mainly seen in 
their absence, such as after an accident has occurred, reducing the funding of 
regulators may appear to be a relatively painless way to reduce the expendi- 
ture of public funds while at the same time not inhibiting perceptible levels 
of public safety. As Reason (1997) notes, this leads to less than satisfactory 
consequences. 


In an effort to work around these obstacles, regulators tend to become 
dependent upon the regulated companies to help them acquire and 
interpret information. Such interdependence can undermine the regula- 
tory process in various ways. The regulator’s knowledge of the nature 
and severity of a safety problem can be manipulated by what the reg- 
ulated organization chooses to communicate and how this material is 
presented. (p. 174) 


Further, regulators also face pressure to maintain their expertise in the 
industries they oversee in the face of continued technological advances. As 
the pace of technological change increases, regulators find it increasingly 
difficult to effectively evaluate systems and the procedures needed to oper- 
ate them. Yet, as new hardware and software are introduced into service, 
regulators are expected to anticipate potential operating errors in the use of 
the new systems, even if they lack the technical expertise to evaluate them. 
As Reason (1997) explains, 


Regulators are in an impossible position. They are being asked to prevent 
organizational accidents in high-technology domains when the aetiol- 
ogy of these rare and complex events is still little understood. (p. 171) 


Regulator Activities 


Given the two functions of regulators, enacting rules and enforcing those 
rules, regulator antecedents to error fall into one of those categories. As with 
other antecedents, investigators must work backward from the error through 
the system to identify the regulator antecedents to error. The nature of the 
antecedent determines the resultant error. 
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Enacting Rules 


There tend to be fewer antecedents from regulator shortcomings in the rules 
that govern the industry than in those related to rule enforcement, because 
over time, as regulatory shortcomings become recognized, often through 
accidents, regulators tend to correct the shortcomings rectified by enact- 
ing rules that address particular deficiencies. For example, in the 1970s and 
1980s, as aviation accidents involving well-designed and well-maintained 
aircraft due to crew error continued to occur, regulators recognized that 
pilots needed to be trained in CRM. In response, in 1998, the U.S. Federal 
Aviation Administration required pilots of air transport aircraft to complete 
CRM training (Federal Aviation Administration, 2004). 

This type of regulatory defiency could be seen in a ferry accident that 
occurred in New York City in 2003 (National Transportation Safety Board, 
2005). The vessel operator experienced what investigators termed an “unex- 
plained incapacitation” as he was about to dock the vessel. As a result, he 
did not slow the vessel as it neared the dock and it crashed into the dock, 
killing 11 passengers. The investigation found that the operator had been 
taking a prescribed pain medication, one of its side effects included seizures, 
a possible factor in explaining his incapacitation. Yet, the Coast Guard, the 
federal agency that regulates U.S. Marine operations, had no prohibition in 
place to warn mariners against using the medication. Nonetheless, fearing 
suspension of his license, the mariner did not report using the medication 
to the Coast Guard. As a result, investigators identified shortcomings in the 
Coast Guard’s medical oversight system, but they did not consider those to 
have played a part in the accident. Investigators identified these shortcom- 
ings as safety concerns in their report and recommended that the Coast 
Guard upgrade its system of medical oversight of mariners to bring it in 
line with the medical oversight that other federal transportation regulators 
(such as the Federal Aviation Administration) conducted. To its credit, the 
Coast Guard agreed and over several years considerably upgraded its system 
of medical oversight of mariners to a level considered equivalent to that of 
other transportation regulators. 

In some systems, such as aviation and marine, both domestic and interna- 
tional regulators establish rules governing system operations, through agen- 
cies that are entities of the United Nations. In aviation, this is carried out by 
the ICOA and in marine, by the IMO. In both transportation modes, their 
rules, once adopted, are enforced by countries on behalf of the international 
regulators, which have no enforcement authority themselves. 

When investigating the role of the regulator in an accident, document the 
applicable regulations and determine whether the errors identified in the 
accident had been addressed in the existing regulations. Be aware of rules 
that are so general as to be of little value in actually ensuring safety. For 
example, pilots in the United States, after an accident due to human error, 
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can be charged with violating rule 14 Code of Federal Regulations 91.13, for 
so-called “careless and reckless” operation. The rule states: 


a. Aircraft operations for the purpose of air navigation. No person 
may operate an aircraft in a careless or reckless manner so as to 
endanger the life or property of another. 


Because the rule does not distinguish between an error due to equipment 
design or training, oversight, and so on, a pilot involved in any accident in 
which error played a part can, in effect, be charged with violating this rule. 


Enforcing Rules 


Antecedents to shortcomings in regulator performance tend to fall in this 
general category, typically in accidents in which the regulator had infor- 
mation, or should have recognized that an entity’s operational safety was 
deficient. For example, when identifying errors resulting from ineffective 
training, for example, training that was required to meet certain regulatory 
standards, investigators should determine whether the regulator training 
should have recognized and addressed the shortcomings, and if so, why 
they were not addressed. 

Investigators of a marine accident that occurred 4 years after the previ- 
ously discussed ferry accident similarly identified an operator whose use 
of prescription medications with impairing side effects played a role in the 
cause of the accident (National Transportation Safety Board, 2009). Unlike 
the previous accident, the mariner had informed the Coast Guard of some, 
but not all, of the drugs he had been taking, but the Coast Guard, which by 
this time had upgraded its medical oversight system, failed to follow up on 
the mariner’s drug use to determine whether the prescribed medications he 
had provided information on to the regulator adversely affected his perfor- 
mance (they did). 

Because the regulator, the Coast Guard, had information about this mar- 
iner’s use of impairing prescription medications, and because it had been 
cited previously for its deficient oversight of mariner medical status but still 
permitted the mariner to operate while using impairing medications, inves- 
tigators determined that the regulator had played a role in the cause of the 
accident. As they wrote: “Also contributing to the accident was the U.S. Coast 
Guard's failure to provide adequate medical oversight of the pilot in view of 
the medical and medication information that the pilot had reported to the 
Coast Guard” (National Transportation Safety Board, 2009, p. 136). 

This accident also demonstrates that in an accident a regulator rarely 
directly causes an error that leads to an accident, as it would, for example, 
with a company that operated a system unsafely. Regulators do not cause 
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accidents, but by not preventing companies from operating unsafely regu- 
lators can be considered to contribute to accidents rather than cause them. 
Alternatively, by failing to ensure compliance with its regulations, regulators 
can be considered to have permitted companies to cause accidents through 
their own actions (or inactions). 


a ai 


Case Study 


Regulators play a role in overseeing the safety of many systems, in addition 
to the ones that we typically consider. For example, in one famous incident, 
the regulator of the U.S. financial industry failed to act on information that 
it had suggesting the need for higher level of enforcement than what it had 
been providing. On December 10, 2008, Bernard Madoff, a well-respected 
financier who had headed a leading electronic securities exchange, con- 
fessed to his sons that the investment firm that he headed, Bernard L. Madoff 
Investment Securities, was a Ponzi scheme. A Ponzi scheme is an illegal 
enterprise in which the operator takes money from one investor and gives it 
to another, pocketing some of the money for himself or herself, by promis- 
ing the “investor” high returns on the money provided. The sons alerted the 
U.S. federal regulator of the financial industry, the Securities and Exchange 
Commission (SEC), of their father's activities and the next day Madoff was 
arrested for securities fraud, among other criminal charges. Madoff pled 
guilty to multiple counts of securities fraud 7 months later, at the age of 76, 
and was sentenced to 150 years in prison. 

Bernard Madoff's fraud was a classic Ponzi scheme, a type of fraud named 
for Charles Ponzi, who, according to the SEC, defrauded thousands of per- 
sons in New England in a phony postage stamp investment scheme (SEC, 
2009). Newspaper accounts said that investors lost an estimated $17 billion in 
the money that they had given Madoff to invest. Counting the paper losses 
from the fraud, that is, the cash losses with the loss of the income Madoff's 
fund was alleged to have generated, the total loss that investors sustained 
was estimated to exceed $60 billion. Among the defrauded investors were 
close Madoff friends, neighbors, relatives, and the late Nobel Prize winter, 
Elie Wiesel, who lost his life savings and whose charitable organization lost 
over $15 million (Strom, 2009) in the scheme. 

The SEC establishes financial reporting requirements for and governance 
of publicly owned corporations, and the trading of equities in those corpora- 
tions, among its other responsibilities. Its role is vital to the integrity of the 
U.S. financial system. Ineffective SEC oversight by a regulator that was estab- 
lished by the federal government during the Great Depression to oversee an 
industry whose fraudulence had led to widespread financial losses of duped 
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investors, could lead to financial catastrophe, the financial equivalent of a 
catastrophic accident in a complex system. 

After Madoff's arrest, it was learned that the SEC had, on multiple occa- 
sions, been informed of suspicions regarding Madoff's trades. Not only had 
several investors reported their suspicions to the Commission, one had filed 
three separate, signed complaints and then met with SEC officials to explain 
why Madoff's purported returns, which he was required to regularly report 
to the SEC and to his investors, could not have been legally obtained. Further, 
two periodicals, one, Barron’s, a widely read and well-respected U.S. busi- 
ness publication, published articles in May 2001, 7 years before Madoff's 
arrest, suggesting improprieties with Madoff's securities. 

In addition to the suspicions raised by complainants to the SEC and by 
financial publications, numerous “red flags” or suspicions regarding the 
nature of Madoff's alleged investments should have been evident to the reg- 
ulator in its oversight of the Madoff securities company. As investigators of 
the SEC’s failure, its inspector general reported, 


* The returns on Madoff's investments were consistent and largely 
unrelated to actual market performance over a 14-year period, a 
highly unusual result through several market downturns 


e Madoff did not charge fees per trade, as was the Wall Street practice 


* His timing of trades, that is, selling before a downturn and buying 
before the market turned up, was consistently successful, something 
that is also highly unusual 


* Theoutside auditor of his fund (an SEC requirement analogous to an 
independent auditor of an SMS system) was his brother-in-law and 
not a major public accounting firm 


* Several investors who considered investing in Madoff's funds, and 
who then examined the funds closely (conducting “due diligence") 
using publicly available information, became suspicious of the funds 
and would not invest in them 


* The alleged financial strategy that the Madoff fund employed could 
not be duplicated by other investors 


In response to both the complaints it received and the publication of the 
articles about Madoff's fund, the SEC conducted at least two examinations 
of Madoff's securities, one in 2004 and one in 2005. Neither uncovered the 
Ponzi scheme. How then did the federal regulator responsible for oversee- 
ing the U.S. financial industry miss discovering Madoff's fraud, one that 
was relatively unsophisticated (as are all Ponzi schemes) and one that pri- 
vate investors became suspicious of using publicly available information? 
To answer this question, the SEC's Office of the Inspector General (OIG), 
an independent monitor established by the federal government to provide 
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impartial examinations of the performance of federal agencies, conducted 
an investigation into the SEC’s failures with regard its oversight of Madoff's 
securities. 

What made the SEC's failure particularly troublesome was its history and 
role in U.S. financial oversight. The SEC had 


* Over 60 years of experience in overseeing the financial industry 
* Considerable in-house expertise at detecting fraud in the industry 
* Promulgated the rules that Madoff was accused of violating 


Underlying the OIG investigation was the knowledge that had the fraud 
been discovered sooner, for example, at the times of its own investigations, 
investors would have saved billions of dollars in losses because Madoff's 
activities would have been terminated and many investigators would not 
have lost their life savings. 

The OIG report (SEC, 2009) describes, in considerable detail, the errors of a 
regulator tasked with considerable responsibility to oversee a vast and com- 
plex system. 

These included: 


* Failing to properly oversee those SEC officials charged with examin- 
ing Madoff's securities 


* Selecting examiners who lacked the necessary expertise to address 
the allegations against Madoff 


* Failing to follow up on overt discrepancies in Madoff's statements 
to examiners 


e Failing to understand the nature of Madoff's alleged financial crimes 
and not making the effort to understand them 


* Failing to recognize the significance of the suspicions raised against 
Madoff's funds 


e Ineffectively communicating findings of one investigation to those 
conducting the second, resulting in considerable duplicated efforts 


* Failing to verify information Madoff provided with readily obtain- 
able information from outside entities that would have demonstrated 
the deceptiveness of Madoff's claims 


The report cited a litany of mistakes and misjudgments that individuals 
within the agency committed regarding Madoff, including repeated errors by 
the same persons. Although the report cited errors that, as with many errors, 
may appear to be inexplicable in hindsight, implicit in the report is the sense 
that some SEC managers were trying to meet the agency's many responsi- 
bilities, in the face of a numerous mandated tasks, with limited resources. 
Investigators also determined that some SEC personnel, as many outside the 
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agency, were impeded in their efforts by their difficulty in believing that 
someone as well-regarded in the financial industry as Bernard Madoff, with 
many prominent investors among his clients, could have been the perpetra- 
tor of a simple Ponzi scheme. 

Regulator personnel did not cause the Madoff Ponzi scheme, but they 
failed to uncover it, despite what was shown to have been considerable avail- 
able information suggesting the scheme, and in one instance outlining it. 
This failure was due to bureaucratic shortcomings in the performance of the 
agency, managers who chose people for the Madoff investigation who lacked 
the necessary expertise, and thereafter provided them with little guidance 
and follow up. Further, these managers also failed to understand the nature 
of the scheme and avoided opportunities to understand it. The failure of 
the SEC to recognize the fraud, after suspicions regarding Madoff were first 
raised in the years before the scheme collapsed, cost investors billions of dol- 
lars, including many who lost their life savings. The regulator did not cause 
the fraud, but its failure contributed to its severity. 





Summary 


Regulators play critical roles in the safety of complex systems. They estab- 
lish the rules governing the operations of the systems they oversee and they 
enforce compliance with those rules to ensure system safety. Regulators 
may establish and enforce rules governing system design, system operation, 
maintenance, and personnel qualifications. Many of the antecedents relat- 
ing to equipment design and organizational antecedents apply to regulator 
antecedents as well. Regulator antecedents to error do not cause accidents 
directly but, by not preventing organizations from operating unsafe systems, 
allow errors that can lead to accidents occurring. 

Typically, regulator shortcomings arise from failures to enforce rules, 
rather than from not enacting them. However, in some instances, accidents 
have occurred in which regulators were shown to have failed to enact ade- 
quate rules. In one accident, the regulator did overhaul its oversight system 
and tighten its rules governing medical oversight, but through inadequate 
enforcement allowed violations of its rules to lead to a subsequent accident. 


DOCUMENTING REGULATOR ANTECEDENTS 


e Examine regulator inspector selection criteria and inspec- 
tor training and determine their competence at assessing the 
safety of a company’s operations. 


136 Investigating Human Error 


e Apply the presentation and control design standards outlined 
in Chapter 4 to assess the quality of the regulator’s oversight 
and approval of the design of equipment used in the system 
being overseen. 


e Determine the extent to which the regulator effectively 
assessed the technology incorporated in the equipment. 


e Evaluate the extent to which the regulator’s operator-licens- 
ing requirements provided an acceptable level of safe system 
operation. 

e Examine the effectiveness of regulator approval of operating 
rules and procedures as applied to the circumstances of the event. 

* Determine the number of inspections of a company, and the 
thoroughness of those inspections, to determine the extent to 
which the regulator met the oversight standards it established. 

* Identify information the regulator had, such as previous com- 
pany incidents, accidents, and rule violations, indicating the 
need for additional oversight and determine whether addi- 
tional oversight was conducted. 

* Determine how responsive the regulator was in addressing 
safety deficiencies that it had identified. 





References 


Choe, Sang-Hun. 2014. South Korea to disband Coast Guard, leader says. The New 
York Times. May 19, 2014. 

Federal Aviation Administration. 2004. Crew Resource Management Training. AC No: 
120-51E. Washington, DC: Federal Aviation Administration. 

National Transportation Safety Board. 2005. Allision of Staten Island Ferry Andrew 
]. Barberi, St. George, Staten Island, New York, October 15, 2003. Report Number 
MAR-05-01. Washington, DC: National Transportation Safety Board. 

National Transportation Safety Board. 2009. Allision of Hong Kong-registered contain- 
ership M/V Cosco Busan with the Delta Tower of the San Francisco- Oakland Bay 
Bridge San Francisco, California, November 7, 2007. Report Number MAR-09-01. 
Washington, DC: National Transportation Safety Board. 

Reason, J. T. 1997. Managing the risks of organizational accidents. Aldershot, England: 
Ashgate. 

Securities and Exchange Commission (SEC). 2009. Investigation of Failure of the SEC 
to Uncover Bernard Madoff's Ponzi Scheme—Public Version. Report No. OIG-509. 
Washington, DC: Securities and Exchange Commission. 

Strom, S. 2009. Elie Wiesel levels scorn at Madoff. The New York Times, February 26. 


8 


Culture 











At Korean Air, “such teamwork has been nearly impossible,” says Park 

Jae Hyun, a former captain and Ministry of Transportation flight inspec- 

tor. Its cockpits have operated under an “obey or else” code, he says. 

Co-pilots “couldn’t express themselves if they found something wrong 
with a captain's piloting skills. 

Carley and Pasztor, 1999 

Wall Street Journal 


E 


Introduction 


On August 6, 1997, a Korean Air Boeing 747 crashed into a hill several miles 
short of the runway in Guam, killing 228 passengers and crewmembers (see 
Figure 8.1). This accident was the latest in a string of major accidents that 
investigators had attributed, at least in part, to errors that the airline’s pilots 
had committed (National Transportation Safety Board, 1999). 

In the 16 years before this accident the airline had experienced one of the 
highest accident rates of any airline. These included the following: 


e August 1983: A Boeing 747 deviated more than 300 miles off course 
into Soviet territory before the Soviet Air Force shot it down 

e December 1983: A DC-10 crashed in Anchorage after the pilots 
attempted to take off from the wrong runway 

e July 1989: A DC-10 crashed in Libya after the crew mishandled an 
instrument approach 

e August 1994: An Airbus A300 crashed in Cheju, Korea, after the 
crew landed at an excessive airspeed 


Even after the Guam accident, the most serious event the airline experi- 


enced since the Soviet Air Force shot down its Boeing 747, pilot error-related 
accidents continued to plague the airline. These included the following: 
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FIGURE 8.1 
The site of the Boeing 747 accident in Guam. (Courtesy of the National Transportation Safety 
Board.) 


* August 1998: A Boeing 747 crashed after the captain misused the 
thrust reverser while landing at Seoul 


e September 1998: An MD-80 ran off the end of the runway at Ulsan, 
Korea 


* March 1999: An MD-80 ran off the end of the runway at Pohang, 
Korea 


e April 1999: An MD-11 freighter crashed in Shanghai 


In 1999, the Wall Street Journal implied that factors rooted in Korean society 
and culture affected the airline and its safety record (Carley and Pasztor, 
1999). The newspaper stated that, 


Korean Air’s history has emphasized hierarchy. It is easy to discern the 
hierarchy: former [Korean] air force pilots, then fliers from other military 
services, and Cheju men [civilians that the airline trained] at the bottom. 
Red-stone rings worn by Korean Air Force Academy graduates com- 
mand instant respect. And ex-military men, while training co-pilots in 
simulators or during check rides, sometimes slap or hit the co-pilots for 
mistakes. In the cockpit, friction and intimidation can cause trouble. For 
a civilian co-pilot to challenge a military-trained captain “would mean 
loss of face for the captain,” says Mr. Ludwin...[a] former Pan Am cap- 
tain. For the co-pilot, he adds, “it’s more honorable to die, and sometimes 
they do.” (Carley and Pasztor, 1999, pp. A1, A2) 
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The newspaper raised a critical issue by suggesting that specific Korean 
cultural factors adversely affected the airline’s safety. Was it correct in its 
implication? Can cultural influences serve as antecedents to errors? This 
chapter will examine cultural factors in complex systems and discuss the 
relationship of cultural factors to operator performance. 





National Culture 


Cultural influences affect and are manifested in the behavior of people who 
work in the same companies, live in the same regions, and belong to the 
same ethnic groups. Some have suggested that cultural factors affect opera- 
tor performance (e.g., Orasanu, Fischer, and Davison, 1997) and hence, system 
safety. “Performance of a plant,” Moray (2000) notes, “is as much affected... 
by the expectations of society, as by the engineering characteristics of the 
design and the ergonomics of individual work and the design of communi- 
cation within and between groups and teams” (p. 860). 
Schein (1990, 1996) defines a culture as, 


(a) a pattern of basic assumptions, (b) invented, discovered, or developed 
by a given group, (c) as it learns to cope with its problems of external 
adaptation and internal integration, (d) that has worked well enough to 
be considered valid and, therefore (e) is to be taught to new members as 
the (f) correct way to perceive, think, and feel in relation to those prob- 
lems. The strength and degree of internal consistency of a culture are, 
therefore, a function of the stability of the group, the length of time the 
group has existed, the intensity of the group’s experiences of learning, 
the mechanisms by which the learning has taken place, and the strength 
and clarity of the assumptions held by the founders and leaders of the 


group. (1990, p. 111) 


The assumptions that Schein refers to are commonly known as norms, 
ways that people act, perceive, and interpret the values that they share, that 
may be unspoken but are nevertheless perceived, felt, and practiced by mem- 
bers of a group. Cultural norms are recognized by members of their culture. 
They help group members understand expectations, customs, and beliefs so 
that they can facilitate integration, acceptance, and beliefs regarding behav- 
iors. For those outside the culture, they can serve to demarcate exclusion. 

Cultural norms help travelers understand how to act when meeting mem- 
bers of a different culture. Cultures can differ in childrearing practices, 
courtship behavior, and deference to the elderly, for example, and those from 
different cultures may be uncomfortable with those differences. Avoiding 
behaviors considered offensive or insulting in different cultures is key to 
harmonious relations with members of those cultures. 
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Much of the recent work relating cultural factors to system safety was 
influenced by Hofstede (1980, 1991), and his study of a multi-national cor- 
poration, IBM, then with offices in 66 countries. In the 1960s and 1970s he 
administered a Likert-type questionnaire to company employees in their 
offices across the globe. With this type of questionnaire respondents are 
given statements and asked to agree or disagree with the statements on a 
scale generally of one, strongly disagree, to five, strongly agree. He found 
differences on several dimensions of behavior among the employees of the 
different cultures. One, he termed “power distance,” refers to the extent to 
which people perceive difference in status or power between themselves 
and their subordinates and superiors. In cultures with high power distance, 
subordinates and supervisors perceive the differences between them to be 
greater than do those in cultures that score low on power distance. In those 
cultures, subordinates would be less willing to confront a superior, or call 
a superior’s attention to an error that he or she may have committed, than 
would their counterparts in countries with low power distance. 

A second dimension, “individualism-collectivism,” characterizes the 
degree to which individuals accept and pursue the goals of the group to 
which they belong, relative to their own individual goals. An individually 
oriented person is more self-sufficient and derives more satisfaction from 
pursuing personal goals than from pursuing group goals. Collectivist- 
oriented persons identify more with the companies that employ them than 
do individually oriented persons. They tend to view errors as reflections of 
the company or group as much as of themselves as individuals. 

“Uncertainty avoidance,” a third dimension, refers to the willingness or 
ability of people to contend with uncertain or ambiguous situations. People 
in cultures with high uncertainty avoidance generally find it difficult to deal 
with ambiguous or unclear situations that have few applicable procedures. 
They would be expected to respect and adhere to rules more readily than 
would their counterparts in cultures with low uncertainty avoidance. Those 
in cultures that are low in uncertainty avoidance feel comfortable respond- 
ing to uncertain or novel situations that they had not experienced before, or 
to situations to which few rules and procedures apply. He labeled a fourth 
dimension masculinity-femininity. Masculine cultural traits refer to asser- 
tiveness and toughness, and are focused on material success. Feminine traits 
are considered to be more modest and tender, and are concerned with the 
quality of life. 

When Hofstede initially conducted his research China was relatively unin- 
volved in the global economy and IBM had no offices in China. Since then 
China has become fully integrated into the world’s economy and Hofstede 
applied his inventory to people in China. The result was a fifth dimension, 
long-term and short-term orientation (Hofstede and McCrae, 2004). Long- 
term cultural traits stress thrift and perseverance while short-term orien- 
tation, reflecting traditional Asian or Confucian traits, emphasizes social 
obligations and tradition. 
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Some researchers have corroborated Hofstede’s findings in a variety of 
settings and identified additional differentiating cultural characteristics 
(e.g., Helmreich and Merritt, 1998). Maurino (1994) described five cultural 
dimensions among operators in aviation that are related to those Hofstede 
identified. These include adherence to authority compared to a participative 
and democratic approach, inquiry in education and learning as opposed to 
rote learning, identification with the group rather than identification with 
the individual, calm and reflective temperament compared to a volatile and 
reactive one, and free expression and individual assertiveness compared to 
deference to experience and age. 

Helmreich and his colleagues argued that the inability of early CRM 
programs to substantially impact the quality of operations was due to the 
influence of cultural factors and the application of a Western crew model of 
CRM to non-Western cultures. As a result, factors such as the critical role of 
junior officers in safety, a fundamental precept of CRM, would be difficult to 
accept in some non-Western cultures that stress rank and status. Helmreich, 
Merritt, and Wilhelm (1999) and Helmreich, Wilhelm, Klinect, and Merritt 
(2001) contend that these programs were developed in the United States, but 
when applied to countries in other cultures difficulties developed. Unlike 
the United States where CRM programs had originated, organizations found 
that when CRM programs were implemented in countries where employees 
scored high on power distance, the junior operators resisted efforts to be 
more assertive in dealing with their superiors, and senior operators did not 
accept their subordinates as fully contributing team members, thus negating 
many of the perceived benefits of CRM programs. 

Researchers have applied Hofstede’s work to a complex system, military 
aviation, to assess the relationship between cultural factors and safety. 
Soeters and Boer (2000) examined the safety records of members the 
North Atlantic Treaty Organization (NATO), and the air forces of 14 of 
its member countries. Many NATO pilots operate the same aircraft, and 
instructors from two of its member countries train almost all of its pilots. 
The air forces, although distinct, often practice together, follow the same 
procedures, and use similar criteria to select their pilots. They found a 
significant relationship between the accident rate of each country and 
the country’s score on Hofstede’s cultural dimensions, particularly indi- 
vidualism-collectivism. Countries with cultures that scored high on indi- 
vidualistic traits had lower accident rates than countries considered more 
group oriented. 

Since Hofstede published his findings, his work has come under criti- 
cism. Tayeb (1994), criticized the methodology Hofstede employed, while 
Chen (2008) and Heine, Lehman, Peng, and Greenholtz (2002) criticized both 
the methodology and the interpretations Hofstede drew from the results of 
his questionnaire administrations. McSweeney (2002) was perhaps, most 
detailed in criticism of Hofstede’s work. For example, among his criticisms 
he writes, 
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Having assumed that the pertinent response differences were caused by 
national values, Hofstede then supposes that the questionnaire response 
differences are decipherable manifestations of culture (cf. Kreweras, 
1982; Smucker, 1982; d'Iribarne, 1991). Despite the criticisms above of 
(this) assumption...let us temporarily assume it to be correct. It requires 
another analytical leap to assert that the cause may be identified through 
its assumed consequences. Disregarding this problem, Hofstede obfus- 
cates the questionnaire response differences with national culture. 
(p. 104) 


Hofstede (2002) responded to McSweeney’s criticism point by point, going 
as far as to label him an “accountant” as his academic appointment was 
in a business college. Overall, the criticisms of Hofstede’s work essentially 
address the datedness of the research, noting how much society and cultures 
have changed since the 1960s and 1970s when Hofstede collected much of 
his data, and the difficulties inherent to ascribing cultural dimensions to the 
results of Likert questionnaires. 

In truth, the criticisms have some validity. It is difficult to conceive of a 
researcher today who would label a cultural dimension as masculinity-fem- 
ininity and ascribe to the feminine side such traits as modesty and tender- 
ness. Moreover, to ethnographers and anthropologists, who spend extensive 
time inhabiting the cultures they study in order to observe, identify, and 
document norms and cultural traits, identifying cultural traits based on 
the results of a quickly completed questionnaire is difficult to accept. Thus, 
Hofstede’s dimensions can be criticized because they have remained static 
while cultures have continued to evolve, and because of the difficulties in the 
method used to derive them. Nonetheless, his work is still widely accepted 
by researchers and accident investigators who may intuitively accept the 
dimensions because of their simplicity and the virtue of their ready applica- 
tion to common observations of cultural differences. Ultimately, Sondergaard 
(1994) applied what may be an effective way to determine the validity of 
Hofstede’s work. “The widespread usage of Hofstede’s culture types beyond 
(the number of) citation(s),” he writes (p. 447), indicates “...validation of the 
dimensions by empirical research.” 





National Cultural Antecedents and Operator Error 


Regardless of what one thinks of Hofstede’s work and that of others who 
have employed questionnaires to identify cultural factors, the presence 
of cultural differences is widely accepted (e.g., Morris and Peng, 1994; 
Nisbett, Choi, Peng, and Norenzayan, 2001; Klein, 2005). The influence of 
cultural factors on system operations are evident; they influence a variety 


Culture 143 


of operator-equipment interactions, and can make the difference between 
effective and erroneous performance. Nonetheless, identifying cultural 
factors as antecedents of operator error is difficult. For one, language dif- 
ferences, and difficulty in an operator’s communicating in a language dif- 
ferent from his or her mother tongue, may account for shortcomings in 
training and oversight. Further, linking an error to a cultural factor, which 
may have been subject to considerable criticism for the manner in which it 
was derived, is not easily accomplished. The factor has to be established by 
reputable research and be widely accepted. Further, because of the potential 
presence of other factors that can serve as error antecedents, the difficulty 
of ascribing errors to cultural factors is made even more difficult. Korean 
Air may have sustained a high accident rate because of cultural factors for 
example, but shortcomings in training and oversight, which may have had 
little to do with Korean cultural factors, may have adversely operator per- 
formance as well. 

Strauch (2010) raised an additional difficulty with ascribing error causa- 
tion to cultural factors. “The very potential of sociotechnical systems for 
intense time pressure and severe consequences from errors,” he writes (p. 
249), “distinguishes system operators from respondents of the bulk of cul- 
tural research.” Administering questionnaires to office workers may result 
in identifiable cultural traits, but those traits may not hold true among those 
who operate complex systems. Office workers do not face potentially severe 
consequences from error, and they rarely make immediate decisions based 
on their recognition of the circumstances they are encountering. 

Strauch (2010) argues that to establish culture as an antecedent to error the 
factor must meet two initial requirements 


* The factor must be strong enough to influence behavior 


* The cultural trait must be sufficiently influential to affect the par- 
ticular trait 


In identifying national culture as an antecedent to error, investigators must 
have sufficient support in the literature to identify the cultural trait in ques- 
tion as both attributable to a particular culture and sufficiently influential to 
affect an operator's performance. If these criteria can be met, investigators 
must then meet a third criterion, excluding other potential antecedents to 
error. As described previously, language difficulties or training shortcom- 
ings, for example, may be incorrectly attributed to cultural factors. 

In sum, the difficulty of establishing cultural factors as antecedents to error 
are considerable, and despite strong beliefs on an investigator's part as to 
their influence, the challenges to investigators in making such identifications 
may be formidable. Antecedents to error may well be culturally determined, 
but ascribing error to such factors in a way that meets the requirements of 
investigative logic may be difficult to achieve. 
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Organizational Culture 


Companies, as tribes, religious groups, and nations, can influence their 
employee's behavior through the norms that they develop, norms that can be 
as powerful an influence on employee behavior as can cultural norms (e.g., 
Schein, 1990, 1996). Numerous illustrations of organizational practices, even 
among companies seemingly dedicated to enhancing operational safety, 
demonstrate the potentially adverse effects of norms on safety practices. For 
example, the New York Times described poor organizational practices in the 
National Aeronautics and Space Administration after it had experienced sev- 
eral major project failures, years after the 1986 accident of its Space Shuttle 
Challenger. The article reported that, 


In candid reports assessing recent problems with the National 
Aeronautics and Space Administration’s (NASA) programs to explore 
Mars, two panels concluded that pressures to conform to the agency’s 
recent credo of “faster, cheaper, better” ended up compromising ambi- 
tious projects. To meet the new constraints, the reports said, project 
managers sacrificed needed testing and realistic assessments of the risks 
of failure. (Leary, 2000) 


Interestingly, the author indicated that NASA management had been criti- 
cized for many of the same management practices and norms demonstrated 
after the agency sustained the 1986 Space Shuttle Challenger accident. 

As with national cultures, corporate cultural factors can affect safety, and 
become antecedents to error or mitigate opportunities for error. Moreover, 
employee groups within companies develop their own norms based on com- 
monly held professional standards and beliefs. Vaughan (1996) examined 
the influence of the cultures at NASA, its primary space shuttle contractor, 
Morton Thiokol, and the shared engineer culture of both, on the Challenger 
accident. She suggested that engineers and their supervisors at both Morton 
Thiokol and NASA had developed techniques of responding to the risky 
technology involved in space operations that minimized the perception of 
and appreciation for the risks inherent to the mission. Despite consider- 
able evidence suggesting that the low outside temperatures at the time of 
the launch could seriously degrade the integrity of the system, officials at 
both organizations agreed to the launch, which proceeded with disastrous 
results. 


Safety Culture 


Recent years have witnessed increased calls for companies to enhance their 
“safety cultures” as a means to improve the safety of their system opera- 
tions. The term safety culture itself is largely credited to the International 
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Atomic Energy Agency which found that the safety culture of the Soviet 
Chernobyl nuclear facility created the circumstances that led to the accident. 
In response the International Nuclear Safety Advisory Group or INSAG, of 
the International Atomic Energy Agency, developed protocols for nuclear 
power facilities to enhance their safety culture (International Atomic Energy 
Agency, 1991). In the United States, two federal agencies have promulgated 
policies calling for the companies they regulate to implement programs to 
enhance their safety culture. One, the Nuclear Regulatory Commission, 
wrote in a policy statement: 


In the United States, incidents involving the civilian uses of radioac- 
tive materials have not been confined to a particular type of licensee 
or certificate holder, as they have occurred at nuclear power plants and 
fuel cycle facilities and during medical and industrial activities involv- 
ing regulated materials. Assessments of these incidents revealed that 
weaknesses in the regulated entities’ safety cultures were an underlying 
cause of the incidents or increased the severity of the incidents. (Nuclear 
Regulatory Commission, 2011, p. 34774) 


What is safety culture and why are regulators endorsing the concept and 
encouraging the companies they regulate to adopt safety cultures? According 
to another U.S. federal agency, the Bureau of Safety and Environmental 
Enforcement, safety culture is defined “as the core values and behaviors of 
all members of an organization that reflect a commitment to conduct busi- 
ness in a manner that protects people and the environment” (Bureau of 
Safety and Environmental Enforcement, 2013). 

Although the Bureau of Safety and Environmental Enforcement has 
defined the term, there is little uniform agreement as to what safety culture 
refers. This is largely because safety culture calls for the definition of two 
distinct terms, safety and culture, both of which are difficult to define and/ 
or, are defined differently according to one’s perspective or field of study. 
As a result, according to Guldenmund (2000), “the concepts of safety cul- 
ture and safety climate are still ill-defined and not worked out well; there 
is considerable confusion about the cause, the content and the consequence 
of safety culture and climate, and the consequences of safety culture and 
climate are seldom discussed” (p. 247). 


Company Practices 


Aspects of a company’s culture are revealed in its selection policies, oper- 
ating procedures, and operational oversight, all of which can affect perfor- 
mance. Companies that operate complex systems are required to perform 
these tasks, but companies that are especially safety oriented will perform 
them more thoroughly, and at a higher level, than would be expected of 
others. Practices that encourage operator responsibility, professionalism, 
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and participation in safety matters can enhance operator attention to safety 
details; punitive practices do not. A company’s culture can also be reflected 
in its definitions of and response to employee transgressions. Companies 
that require extensive documentation of occasional and infrequent medical 
absences, for example, encourage their employees to report to work when ill, 
increasing the likelihood of errors. 

Managerial instability and frequent changes in supervisory personnel, 
supervisory practices, and operating procedures may reflect instability, an 
indication of a corporate cultural factor that could affect safety. Instability 
can adversely affect operator performance by leading to frequent changes 
in interpretations and enforcement of policies and procedures. One supervi- 
sor may interpret procedures literally and expect the same interpretation 
and compliance from operators. Another may interpret the procedures dif- 
ferently, and permit operators to comply with those he or she considers most 
important, while ignoring those that are perceived to have little or no influ- 
ence on system operations. Instability can also signal dissatisfaction with the 
company and its practices. 

Previous company incidents and accidents can also reveal much about cor- 
porate commitment to safety. Numerous incidents and accidents relative to 
those of comparable companies suggest deficiencies in company practices, 
standards, and oversight. Similar issues found in multiple events may indi- 
cate an unwillingness to identify and address potential system safety haz- 
ards. On the other hand, thorough company investigations of incidents and 
accidents and sincere efforts to address identified safety deficiencies reveal 
aspects of a positive corporate culture. 

Investigators noted the adverse effects on safety of some organiza- 
tional norms in a January 1996 rail accident outside of Washington, D.C. 
(National Transportation Safety Board, 1996). A Washington Metropolitan 
Area Transit Authority subway train was unable to stop and struck a train 
stopped ahead on the same track, killing the moving train’s operator. The 
track had received a large amount of snow that had fallen throughout the 
day, reducing friction on the exposed tracks. Several times before the acci- 
dent the train operator had requested authorization to disengage automatic 
train control and operate the train manually to better control braking on the 
slippery track. However, the Authority’s director of operations had prohib- 
ited manual train control under any circumstances, in an effort to reduce 
train wheel wear. Supervisors were reluctant to violate his order and grant 
the train operator’s request, despite their awareness of the slippery track 
conditions. Not one of the supervisors believed that he had the authority to 
countermand the policy, even though all knew that adhering to it posed a 
threat to system safety. 

The lessons of this accident apply to others in a variety of settings. 
Corporate norms that encourage unquestioning acceptance of rules risk 
jeopardizing safety when they no longer apply, and companies that manage 
through fear will, over time, increase the probability of unsafe operations. 
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High Reliability Companies 


Companies that operate complex systems can establish practices that promote 
safe operations. After studying high-risk systems, Rochlin (1999) described 
what he calls “high reliability organizations.” Expecting to focus on avoid- 
ing errors and risk management, he found that some organizations tended 
to anticipate and plan for, rather than react to, unexpected events. They 
attended to safety while efficiently operating complex systems, rewarded 
error reporting, and assumed responsibility for error rather than assign- 
ing fault. These companies actively sought to learn from previous errors 
by maintaining detailed records of past events and applying the lessons of 
those events to system operations. As Rochlin writes, 


Maintenance of a high degree of operational safety depends on more 
than a set of observable rules or procedures, externally imposed training 
or management skills, or easily recognized behavioural scripts. While 
much of what the operators do can be formally described one step at a 
time, a great deal of how they operate, and more important, how they 
operate safely, is “holistic,” in the sense that it is a property of the inter- 
actions, rituals, and myths of the social structure and beliefs of the entire 
organization, or at least of a large segment of it. (p. 1557) 


Rochlin argues that an organization’s “interactions, rituals, and myths”— 
essentially its norms—can either help to create antecedents to error, or can 
anticipate and minimize their presence. An organization with a “good” cul- 
ture encourages safety, even at the expense of production. It fosters commu- 
nication among its employees and can proactively uncover problems in its 
operations. Westrum and Adamski (1998) offered techniques for companies 
to enhance safety through internal communications and error reporting. 

Reason (1990) describes the benefits of programs that allow employees to 
report mistakes without retribution. Industries in several countries have 
implemented such self-reporting programs, which have provided consider- 
able information about potential antecedents to error, before the antecedents 
could affect operator performance. The aviation industry has implemented a 
number of these programs, such as ASRS—Aviation Safety Reporting System 
in the United States, and CAIR—Confidential Aviation Incident Reporting, 
in Australia. 





Organizational Cultural Antecedents and Operator Error 


Identifying organizational cultural antecedents to an operator's error calls 
for the same steps used to identify organizational factors as error anteced- 
ents outlined in Chapter 6. Consequently, investigators need to determine 
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whether the company (1) acted, decided, or made decisions improperly in 
the face of information alerting them to the need for different actions or deci- 
sions, (2) acted or decided improperly in the face of self-evident information 
of the need for corrective action, or (3) took no action or made no decision 
when an action and/or decision was warranted. 


TE] 


Summary 


Cultures develop norms that influence the values, beliefs, expectations, 
behaviors, and perceptions of their group members. Recent studies have 
identified several cultural factors, including power distance, the perceived 
differences between superiors and subordinates, individualism-collectiv- 
ism, the extent to which people accept their own goals relative to those of 
their group’s, and uncertainty avoidance, the willingness to deal with uncer- 
tain situations, that can distinguish among members of different cultures 
and influence system safety. Although there is disagreement about these 
particular cultural factors, most researchers agree on the influence of culture 
on individual behaviors. However, establishing a link between cultural fac- 
tors and an operator’s error is difficult. 

Companies also develop norms through their actions, statements, prac- 
tices, and policies. Hiring criteria, training programs, operating procedures, 
and oversight reflect these norms. Some companies actively encourage safety 
by recognizing and addressing potential operational hazards. Differences 
between “good” and “bad” organizational cultures are suggested. 


DOCUMENTING CULTURAL ANTECEDENTS 
NATIONAL CULTURE 


* Refer to existing research to determine the effects of norms 
that are believed to influence safety (e.g., Hofstede, 1980, 1991; 
Helmreich and Merritt, 1998; Soeters and Boer, 2000; Helmreich 
et al., 2001), when national cultural issues need to be examined. 


ORGANIZATIONAL CULTURE 


* Identify organizational cultural factors and assess their effects 
on system safety by interviewing employees at all pertinent 
company levels, and examining written documentation such 
as memos, organizational policies, and procedures. 
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* Document the extent to which company policies are enforced, 
and the extent to which operator expectations regarding com- 
pany enforcement practices are met. 


e Identify company recognized transgressions and the penalties 
it administers to operators who transgress. 


e Evaluate the comprehensiveness of selection practices, training 
programs, maintenance practices, and operational oversight 
and, if possible, compare them to other companies in the same 
industry. 


* Document managerial and operator turnover each year for a 
period of several years from the time of the event, and the rea- 
sons given for employees leaving the company. 


* Determine the number of incidents and accidents the company 
has experienced over several years from the time of the event, 
assess the comprehensiveness of the organizational investiga- 
tion of the events, identify common issues that may be present, 
and the remediation strategies the company has implemented 
to prevent future events. 

* Document the resources that companies have devoted to pro- 
grams that directly affect system safety, such as self-reporting 
error programs, rewards for suggestions to enhance safety, and 
efforts to remain current with industry and government safety 
programs. 
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Operator Teams 











In [the University of] Miami's pass-oriented offense, they [the offen- 
sive linemen] do this by acting as one, a solid wall, so that their indi- 
vidual achievement is less visible than their group achievement. The 
Cane’s offensive line is the best such group in the country, Gonzalez 
says, because they are selfless and because they adjust to one another’s 
strengths and weaknesses. They act as a unit, both on and off the field. 


Jordan, 2001 
New York Times Magazine 
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Introduction 


Complex systems usually call for two or more individuals to operate equip- 
ment. The sheer complexity of these systems, the number of tasks that must 
be performed, and the amount of information that must be processed call for 
multiple operators. Further, with additional operators, should a team mem- 
ber commit an error, another could correct it or minimize its consequences. 
The use of operator teams also enables operators to share duties as needed, 
helping to balance individual workload as operating cycles change. 
Multiple operators working together to oversee system operations form 
operator teams. Teams have certain characteristics, as Dyer (1984) describes, 


A team consists of (a) at least two people, who (b) are working towards 
a common goal/objective/mission, where (c) each person has been 
assigned specific roles or functions to perform, and where (d) comple- 
tion of the mission requires some form of dependency among the group 
members. (p. 286) 


This definition, applying to all teams regardless of their contexts, helps 
explain the common objective of operator teams in complex systems: safe 
and effective system operation. 

Operator teams offer several advantages over single operators and an 
extensive body of research supports the efficacy of operator teams in com- 
plex systems. As Salas, Grossman, Hughes, and Coultas (2015) write, 
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Teams are advantageous to individuals in many ways. They pool diverse 
knowledge and skills, allowing for convergent and divergent thinking, 
the building blocks of creativity and knowledge generation (Hoegl and 
Parboteeah, 2007). They also provide a source of backup and assistance 
for overworked or underskilled team members, and can be a source of 
positive affect and increased morale (Salas, Sims, and Burke, 2005). They 
allow sharing workload so that some operators do not become over- 
whelmed by task requirements during certain operational phases, they 
allow specialization among team members, so that different types of 
expertise can be brought to the system, enlarging the scope of expertise 
possible with only one operator, and they allow operators to observe the 
others’ performance, and prevent or mitigate the effects of errors before 
they can lead to serious consequences. 


So influential has the use of teams become in complex systems that teams 
have been characterized as “the strategy of choice when organizations are 
confronted with complex and difficult tasks” (Salas, Cooke, and Rosen, 2008, 
p. 540). 

The value of operator teams can be seen in several accidents in which teams 
provided greater levels of safety than could single operators. For example, in a 
1989 accident, a McDonnell Douglas DC-10 experienced a catastrophic engine 
failure that severed all hydraulic lines, leading to the loss of hydraulic systems 
and with that the loss of airplane control (National Transportation Safety Board, 
1990). Fortunately, an instructor pilot who was seated in the cabin quickly rec- 
ognized the severity of the problem and offered to assist the pilots. The instruc- 
tor pilot had earlier practiced controlling and landing a DC-10 with a similar 
hydraulic failure in a flight simulator, using extraordinary and unconventional 
control techniques. He then guided the crew, helped manipulate the available 
controls, and assisted in bringing the airplane to an emergency landing. Their 
joint efforts saved the lives of over half the passengers and crew. 

Yet operator teams, while beneficial to system safety, can also allow the 
introduction of unique errors into systems because of the potential for errors 
resulting from interactions within the teams. In such cases multiple operator 
teams not only do not enhance safety, they can degrade it by creating unique 
error antecedents. This chapter will describe elements that contribute to 
team effectiveness, types of team errors, and how to identify and determine 
the effects of the error antecedents associated with operator teams. 





SS 
What Makes Effective Teams 
Leadership 


Effective leaders are necessary for effective teams. Burke et al. (2006a) con- 
ducted a meta-analysis of the team performance literature to determine the 
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type of leadership skills necessary for effective leadership. They observed 
that a leader “is effective to the degree that he/she ensures that all functions 
critical to task and team maintenance are completed” (Burke et al., 2006a, 
p. 289). They note that effective leaders carry this out by performing essen- 
tially two tasks: (1) overseeing team accomplishment of particular tasks and 
(2) facilitating team interaction and development. In their view leaders need 
to be both task oriented and people oriented to be effective. 

Burke, Sims, Lazzara, and Salas (2007) sought to determine what makes 
team members follow effective leaders. They found that effective leaders 
engender trust among team members to enable them to follow their lead- 
ership. Among other characteristics, they suggest that trust is based upon 
leaders providing both compelling direction to the team so that its members 
perceive the tasks as challenging, clear, and consequential, and an enabling 
structure that facilitates the team’s accomplishing the necessary tasks. 
Leaders are seen to genuinely care about the well-being of their subordinates 
and treat them fairly, manifest integrity as leaders, and provide a safe envi- 
ronment for their subordinates to express their views without fear of risk. 

Bienefeld and Grote (2014) examined the effectiveness of leadership in a 
system in which multiple teams worked together within the larger system 
to achieve a common goal. Using a scenario based on an aircraft accident in 
which the pilots delayed landing after being informed of an inflight fire, they 
assessed how well teams of flight attendants and teams of pilots worked 
together, in their respective duties, to communicate the critical information 
pilots needed to commit to land the airplane as quickly as possible, to pre- 
pare the cabin and the passengers for the landing, and to communicate with 
each other to share critical information as needed. They found that “shared 
leadership” (p. 281), in which leaders of the respective teams led the teams 
in their tasks, working together toward the common goal, was a “powerful 
predictor” (p. 281) of the success of the teams in meeting the common goal. 


Teamwork 


Researchers have also focused on the role of team members, and factors 
that influence the extent to which they effectively work together to meet 
the common goal, that is, teamwork. Salas, Sims, and Burke (2005) suggest 
that five core components, each of which is needed for effective teamwork: 
team leadership, mutual performance monitoring (the extent to which team 
members monitor each other's performance to catch errors), backup behavior 
(providing resources to team members when needed), adaptability (recog- 
nizing and readjusting performance to respond appropriately to deviations), 
and team orientation (tendency for team members to enhance each other’s 
performance while performing group tasks). These may be manifested 
differently among different teams, according to the demands on the team 
engendered by the particular circumstances. Teams work together through, 
what Salas et al. (2005) describe as three coordinating mechanisms, shared 
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mental models among team members of the situation being encountered and 
the appropriate team response, closed-loop communications in which team 
members effectively communicate with each other (i.e., provide and under- 
stand communications as necessary), and lastly, mutual trust. 

Driskell, Goodwin, Salas, and O’Shea (2006) examined attributes that con- 
tribute to effective team member performance. Recognizing that teamwork 
requires skills in both performing critical system-related tasks and in interact- 
ing with team members, they studied the interpersonal skills needed for team- 
work. They proposed five sets of skills that were needed to interact effectively 
with other team members: emotional stability, that is, lack of anxiety and being 
calm and self-confident; extraversion, that is, to include team orientation, social 
perceptiveness and expressivity, as well as the ability to subjugate desires for 
dominance; openness, that is, flexibility and openness to experience; agreeable- 
ness, which includes kindness, trust, and warmth; and finally conscientious- 
ness, to include achievement striving and dependability. “We assume,” they 
write, “...that team members who possess these personality facets will be more 
effective under specified conditions than those who do not” (p. 265). 

Salas, Grossman, Hughes, and Coultas (2015) focused on the role of team 
cohesion, the extent to which team members want to work together, in team 
effectiveness. They found that team cohesion is a multi-dimensional trait 
that incorporates both the task and interpersonal elements of teamwork. 
They argue that cohesion is a critical element for team effectiveness. 

DeChurch and Mesmer-Magnus (2010) examined the role of shared mental 
models in team effectiveness, defining them as “knowledge structures held 
by members of a team that enable them to form accurate explanations and 
expectations for the task, and in turn, to coordinate their actions and adapt 
their behavior to demands of the task and other team members” (p. 2). They 
found that, regardless of the measurement technique used, the research 
demonstrates that shared mental models among team members predict the 
efficacy of team performance. Burke et al. (2006b) examined the role of adap- 
tation (or adaptability) and its role in team effectiveness. They suggest that 
the ability of teams to adjust their actions according to situational needs, 
that is, dealing with performance obstacles through innovation and adopt- 
ing new routines, underlies their effectiveness in adapting to the situation, 
and thus in engendering team effectiveness. 





Team Errors 


Operators in any complex system can commit errors, but certain errors are 
can only be committed by operator teams. For example, Janis (1972) identi- 
fied errors that teams of highly qualified individuals committed in several 
prominent historical events, such as the decision to invade the Bay of Pigs in 
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Cuba in 1961, and the failure of Admiral Husband Kimmell, the commander 
of U.S. forces in Pearl Harbor in 1941, to prepare for a Japanese assault.’ Janis 
suggests that the cohesiveness of select groups and their subtle deference 
to respected leaders can lead to what he termed “groupthink.” Groups that 
succumb to groupthink have difficulty considering ideas or assessing sit- 
uations that are contrary to their often unspoken norms. Groupthink has 
since become an accepted construct in psychology, to explain certain types 
of group decision-making errors (e.g., Salas et al., 2005). 

Teams need time to develop the necessary cohesiveness and deference to 
the leader that groupthink requires. However, these are not characteristic of 
complex systems. Although severe consequences resulted from the group- 
think errors Janis cited, the environments in which the errors were com- 
mitted were relatively static and the team members had sufficient time to 
evaluate the costs and benefits of decision options. That is not the case in 
complex systems where teams face time pressure, uncertainty, and poten- 
tially severe consequences from errors. The literature on team performance, 
which identifies factors necessary for team effectiveness, has implied that the 
lack of such factors contribute to team errors. Investigators need to be able 
to identify unique team errors and their antecedents when describing opera- 
tor errors in complex systems because most such systems employ operator 
teams within the systems. While it may take a single engineer, for example, 
to operate a locomotive, it takes dispatchers working with the engineers (and 
conductors in some railroads) to ensure that tracks are clear, signals are cor- 
rect, and that crossovers or switches are properly aligned. 

DeChurch and Zaccaro (2010), as did Bienefeld and Grote (2014), looked at 
multi-team systems and system breakdowns. They suggest that the inter- 
dependence of different teams working together in a complex system can 
create difficulties that can lead to errors. As they note (p. 331), “systems fail 
more often because of between team breakdowns than because of within- 
team breakdowns,” and that as teams become increasingly cohesive, the 
boundaries between teams may strengthen, potentially diminishing multi- 
team interdependence. Wilson, Salas, Priest, and Andrews (2007) studied the 
cause of a particular type of team error, fratricide in military environments, 
to develop a taxonomy of team breakdown causes. By examining one type 
of error, team breakdowns, they described how errors in communication, 
coordination, or in cooperation can lead to errors in team cognition that can 
lead to fratricide events. 





* Since Janis completed his work, historians have reexamined Admiral Kimmel's role in the 
lack of effective preparations against the Pearl Harbor attack. A number believe that some in 
the U.S. government, while not knowing of the Pearl Harbor attack in advance, had critical 
intelligence of possible Japanese military strikes in the Pacific region, which they did not 
share with Admiral Kimmel. 
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Operator Team Errors 


It can be seen that features of teams, team leaders, team members, and the 
environments in which they operate influence the likelihood of operator 
team errors. The roles of the operators, companies, equipment designers, and 
regulators, among others, in influencing operator error have previously been 
discussed. In this chapter errors and antecedents characteristic of teams 
operating complex systems will be examined. 

In addition to the errors that any individual operator can commit, operator 
teams can in turn commit these types of errors 


e Failing to notice or respond to another's errors 

e Excessively relying on others 

* Inappropriately influencing the actions or decisions of others, and 
* Ineffectively delegating team duties and responsibilities 


These errors or breakdowns in team effectiveness, which have been 
described previously using the terms developed by the researchers them- 
selves, have been adopted in this text to facilitate the ability of investigator to 
apply the concepts from research to error investigations. 


Failing to Notice or Respond to Another’s Errors 


The most common type of error of operator teams is committed when opera- 
tors fail to notice or respond to the errors of other team members, what Salas 
et al. (2005) refer to as mutual performance monitoring as well as backup 
behavior. This error may result from any number of antecedents, such as one 
operator not attending to or not monitoring the actions of the other. However, 
in some circumstances operators notice the errors of others but fail to respond 
appropriately, a failure that may be due to cultural influences, as was dis- 
cussed in Chapter 8. Such an error negates one of the critical advantages of the 
use of teams, catching or mitigating the effects of the errors of other operators. 

The National Transportation Safety Board (1994) studied errors in 37 acci- 
dents that occurred in the team environment of a complex system—the 
cockpit of air transport aircraft. They found that one error, failing to moni- 
tor/challenge the performance/errors of another, was one of the two error 
types they noted that were specific to operator teams. The other, which the 
National Transportation Safety Board referred to as resource management, 
will be discussed shortly. 


Excessively Relying on Others 


This type of error can occur when operators possess different types or levels 
of expertise. It can lead to severe consequences when operators rely on other 
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team members to such an extent that they fail to perform their own tasks 
effectively. 

Junior operators, who typically work alongside those with more experi- 
ence, seniority, authority, or status, occasionally commit this type of error. 
They may disregard their own knowledge and rely excessively on others to 
decide, act, or perform critical duties. A team error could result if the person 
being; relied upon makes an error, or if his or her skills or knowledge is inad- 
equate for the task requirements. This error is highlighted in the case study 
of this chapter. 


Inappropriately Influencing the Actions or Decisions of Others 


Operators may not have sufficient time to effectively assess the situations 
they encounter, and in the occasionally stressful, uncertain environment that 
complex system operators may encounter, one operator could exert extraor- 
dinary influence on the situation awareness and subsequent actions of the 
others. In highly dynamic conditions an operator that assesses a situation 
incorrectly could adversely affect another's situation awareness, even if that 
person had initially assessed the situation accurately. The operator with the 
inaccurate situation assessment could then create an operator team error by 
interfering with the assessments of other team members. 

A 1989 Boeing 737-400 accident illustrates how in ambiguous situations one 
operator can inappropriately influence the other (Air Accidents Investigation 
Branch, 1990). About 13 minutes after takeoff, the pilots felt what investiga- 
tors termed “moderate to severe vibration and a smell of fire.” The flight 
data recorder (FDR) showed that, at that time, the left engine was vibrating 
severely and exhibiting other anomalies. According to investigators, 


The commander took control of the aircraft and disengaged the autopi- 
lot. He later stated that he looked at the engine instruments but did not 
gain from them any clear indication of the source of the problem. He also 
later stated that he thought that the smoke and fumes were coming for- 
ward from the passenger cabin, which, from his appreciation of the air- 
craft air conditioning system, led him to suspect the No. 2 (right) engine. 
The first officer also said that he monitored the engine instruments and, 
when asked by the commander which engine was causing the trouble, he 
said “It's the le...It’s the right one,” to which the commander responded 
by saying “Okay, throttle it back.” The first officer later said that he had 
no recollection of what it was he saw on the engine instruments that led 
him to make his assessment. The commander’s instruction to throttle 
back was given some 19 seconds after the onset of the vibration when, 
according to the FDR, the No. 2 engine was operating with steady engine 
indications. (p. 5) 


Forty-three seconds after the onset of the vibrations, the commander 
ordered the first officer to “shut it down,” referring to the right engine. 
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Investigators found that, although the left engine of the two-engine air- 
plane had sustained substantial internal damage, damage that had caused 
the vibrations that the pilots observed, they incorrectly shut down the right 
engine in the mistaken belief that it was the one that was causing the difficul- 
ties. That engine was later found to have been undamaged before the acci- 
dent. Rather, the left engine, the one pilots believed to have been operating 
effectively, was found to have been damaged. The pilots recognized this only 
moments before the accident, when it was too late to restart the right engine 
and avoid the impact. The aircraft crashed short of the runway, striking a 
motorway. Although no one on the ground was injured, 47 passengers were 
killed and 74 passengers and crew were seriously injured in the accident. It is 
possible, if not likely, that had the first officer said nothing, with the captain’s 
experience, he would have correctly identified and responded correctly to 
the engine that had failed. 


Failing to Delegate Team Duties and Responsibilities 


Operators must attend to ongoing system operations when responding to 
emergencies. These can create considerable demands on their attention, and 
can lead to errors in either the emergency response or in system operations. 
In situations such as these operator teams can effectively respond to the dif- 
ferent operational requirements, responding to the emergency and man- 
aging system operations, provided team leaders delegate responsibility to 
operators to both operate the system and respond to the anomaly in these 
situations. Failing to delegate tasks to team members in nonroutine situa- 
tions can lead to a team error as the response to either the anomaly, or the 
system operation, or both, may be erroneous. 

A team of three pilots committed this type of error in a 1972 accident involv- 
ing a Lockheed L-1011 that crashed in the Florida Everglades, a vast national 
park in South Florida (National Transportation Safety Board, 1973). The three 
had put the airplane into a hold over the Everglades while they attempted to 
determine the cause of an indicator failure. The indicator, which had failed 
to illuminate, signified the status of the landing gear, whether extended or 
retracted. The three pilots attended to the indicator light but not to the air- 
plane's flight path and as a result they did not notice that the mechanism that 
controlled the airplane's altitude had disengaged. The airplane slowly lost 
altitude until it struck the ground. 

Researchers at the National Aeronautics and Space Administration, later 
studied this type of error. Using a Boeing 747 flight simulator they examined 
the response of pilots to a system anomaly (Ruffell-Smith, 1979). Using actors 
as cabin crewmembers they presented pilots with a scenario that included 
an anomaly, and then tried to distract the pilots with nonessential ques- 
tions from the “cabin crewmembers.” Several pilots responded to the “cabin 
crewmembers,” became distracted, and committed errors that exacerbated 
the severity of the situation. As in the Everglades accident, the pilots in the 
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simulator allowed an anomaly to become a serious event by failing to ensure 
that team members were monitoring the system, and by not ignoring non- 
critical distracters to enable them to focus on more critical tasks during the 
high workload periods being observed. 

In response to these findings, and to those of several accident investiga- 
tions, airlines and the research community developed crew resource man- 
agement (or CRM) to help crewmembers contribute effectively to both routine 
and nonroutine system operations. The programs stressed the need for clear, 
unambiguous delineation and assignment of operator duties and responsi- 
bilities in response to nonroutine situations (e.g., Foushee and Helmreich, 
1988; Helmreich and Foushee, 1993; Helmreich and Merritt, 1998). Today, 
CRM is widely accepted in aviation, marine, and rail operations, and other 
systems where operator teams are used to control the systems. It has evolved 
to where it no longer merely strives to improve team performance in general 
but to enhance operator teams’ abilities to mitigate error. As Salas, Burke, 
Bowers, and Wilson (2001) write, CRM 


represents an awareness that human error is inevitable and can provide 
a great deal of information. CRM is now being used as a way to try to 
manage these (human) errors by focusing on training teamwork skills 
that will promote (a) error avoidance, (b) early detection of errors, and 
(c) minimization of consequences resulting from CRM errors. (p. 642) 


Unfortunately, the research findings on the safety effects of CRM have not 
been consistent. Helmreich, Merritt, and Wilhelm (1999), in observing actual 
flights, found that CRM training improved the quality of crew performance. 
However, reviews of the efficacy of CRM programs have been uncertain. 
Salas et al. (2001), Salas, Wilson, Burke, and Wightman (2006), and O’Connor 
et al. (2008), respectively, conducted mega-analyses of research on CRM to 
determine the consensus of research on the efficacy of CRM. The findings 
of each study were consistent in that CRM training changed operator atti- 
tudes regarding teamwork and team effectiveness. However, with regard to 
changing behaviors and to improving safety, the results were mixed. 





Operator Team: Antecedents to Error 


Antecedents that lead to individual operator errors can also lead to team 
errors. In addition, antecedents unique to operator teams can lead to team 
errors. Antecedents to individual operator errors can harm team effec- 
tiveness by degrading the performance of a member of an operator team, 
thereby leading to team errors. Antecedents that degrade team performance 
are unique to multiple operators, but the effects may be the same in terms of 
their adverse influence on team effectiveness. 
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Equipment 


As discussed in Chapter 4, features of information presentation and control 
design affect operator performance. Those that apply to single-operator sys- 
tems apply to operator teams as well, with some additions. These pertain to 
an operator's ability to (1) interact with other team members, (2) access the 
information presented to other team members, and (3) control the system for 
other team members. 

Equipment designed for multiple operators enables team members to 
communicate with each other when needed, access critical information, and 
maintain system control. However, design features of some systems may 
interfere with team performance (e.g., Bowers, Oser, Salas, and Cannon- 
Bowers, 1996; Paris, Salas, and Cannon-Bowers, 2000). For example, some 
designs prevent operators from learning of other team members' control 
inputs and the system information they receive. Actions on keyboards or 
touchscreens, for example, are not as salient to other team members as are 
lever movements. Replacing levers with touchscreens can decrease the abil- 
ity of team members to learn of their colleagues' control inputs, thereby 
degrading both individual and team situation awareness. 


Operator 


Physiological and behavioral antecedents that lead to errors in single-opera- 
tor systems can affect operator teams as well. The influence of these anteced- 
ents on operator performance in single operator or operator team systems is 
comparable. 


Company 


Companies have substantial influence on the quality of the teams they 
employ, as discussed in Chapter 6. Companies evaluate candidates' inter- 
personal skills, in addition to their technical expertise, when selecting 
candidates for operator team positions. Those who are unable to interact 
effectively with other team members can adversely affect the quality of their 
team's performance and companies should screen applicants to assure that 
those it hires can interact effectively with team members. 

Foushee (1984) described an incident on an air transport aircraft in which 
a captain demeaned a first officer and expressly discouraged him from pro- 
viding input to the conduct of the flight. The captain's actions degraded the 
quality of team interactions by belittling a team member and thus discour- 
aging him from contributing to team effectiveness. Because of the captain's 
behavior, the first officer was less likely to speak up in response to an error 
of the captain, or even may have been unwilling to mitigate the effects of that 
error. Since then operators in many cultures have developed little tolerance 
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for such actions by individuals in positions of responsibility in safety-critical 
systems. Their behavior reflects on the company’s selection criteria, training, 
and oversight as much as on them as individuals. 

The quality of a company's procedures can also affect the quality of team 
member interaction and serve as an antecedent to operator team errors. For 
example, companies can require operators to challenge and respond to each 
other, so that one verifies that another has performed a task, or one con- 
firms that he or she received information from the other. Procedures can 
increase the level of operator contribution to team tasks, encourage operators 
to observe and participate in aspects of each other’s performance, and reduce 
antecedents to error among team members. 


Number of Operators 


The number of operators performing a given task can influence the quality of 
the task. An excessive number can degrade communications within a group 
and lower individual workload to the point that operators become bored, 
adversely affecting system monitoring and other aspects of performance 
(O’Hanlon, 1981). However, because of financial concerns, most organiza- 
tions are more likely to have too few rather than too many operators, and as 
a result, this will not be considered further. 

Situations in which the available number of operators is insufficient for the 
tasks to be performed occasionally occur, especially during nonroutine situ- 
ations. An insufficient number of operators can increase operator workload, 
increase individual team member stress, and reduce levels of operational 
safety (Paris et al., 2000). 

Yet, the dynamic nature of complex systems can make it difficult to plan 
for a constant workload. Operating cycles, with differing operator workload 
requirements, need different numbers of operators. A team that has a suf- 
ficient number of operators for one operating phase may have an insufficient 
number for another, and a team that is insufficient for nonroutine opera- 
tions may be excessive for routine conditions. The adequacy of the number 
of operators assigned to a task will vary according to the operating phase, its 
complexity, and the level of operator workload. 


Team Structure 


Operators in teams work best when each team member understands his or 
her tasks, and contributes to the work of the other team members without 
interfering with their tasks (Paris, Salas, and Cannon-Bowers, 1999). Teams 
in which members are uncertain of their roles and responsibilities are said 
to have poor structures. However, as with team size, a team structure that is 
effective for routine operations may be ineffective for nonroutine situations. 
As seen in the 1972 accident involving the Lockheed L-1011 that crashed 
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in the Florida Everglades, a team structure effective for routine operations 
could break down in nonroutine circumstances. Despite their response to 
what turned out to have been a relatively benign situation, a visual alert that 
did not illuminate as expected, the team members failed to monitor a critical 
element of system operations, the airplane's altitude. 

Some have found that operators’ roles within their teams can affect their 
situation awareness and other critical performance elements. For example, 
in commercial aviation one pilot typically performs the flying duties and 
the other monitors the subsystem performance and supports the “flying” 
pilot, although the captain remains in command throughout. On subse- 
quent flights they generally alternate duties as pilot flying and pilot not 
flying. Jentsch, Barnett, Bowers, and Salas (1999) reviewed over 400 anony- 
mous reports that pilots had filed describing their own errors to an anony- 
mous self-reporting system. They found that captains were more likely to 
lose situation awareness when they were flying the airplane, that is, actively 
engaged in system control, and when the subordinate pilots, the first offi- 
cers, were monitoring the captains’ performance. Captains were less likely 
to lose situation awareness when the first officers were performing the flying 
duties and they were monitoring the others’ performance. The findings con- 
tradict a common belief that active engagement in system control enhances 
situation assessment. The researchers suggest that monitoring gave the cap- 
tains the ability to observe system parameters and obtain situation aware- 
ness better than would have been true had they been flying the aircraft 
themselves. In addition, the superior-subordinate positions of captains and 
first officers, which made the latter somewhat reluctant to alert the captains 
to their errors when they were the flying pilots, may have also contributed 
to the captains’ reduced situation awareness when they were flying. 


Team Stability 


Team stability, the extent to which team members remain together as a work- 
ing team, can also affect the quality of team performance. Working together 
allows team members to learn about each other’s performance and work 
styles, and to develop reasonably accurate expectations of other’s near-term 
performance, as often occurs with members of athletic teams who have 
played together over a period of time. The players and the team members 
learn subtle aspects of each other’s performance over time that enable them 
to reliably predict each other's actions, facilitating communications and 
enhancing performance. In emergency operations, when operators may face 
intense workload and have little available time, stability can lead to enhanced 
communications as the operators accurately anticipate each other's actions 
without articulating them. 

However, in some systems long-term stability may not be possible. 
Contractual obligations and prevailing customs may dictate different 
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work schedules among team members with different levels of seniority. 
Several airlines, for example, employ thousands of pilots, many of whom 
fly with pilots they had neither flown with nor even met previously. In 
systems such as these, the consistent application of standard operating 
procedures can compensate for the lack of team stability. Well-defined and 
practiced procedures enable operators to anticipate their fellow operators’ 
actions in each operating phase, during both routine and nonroutine situ- 
ations, regardless of the length of time they had been teamed with the 
other operators. 

Companies can also use operator selection to counteract the potentially 
adverse effects of team instability. Paris et al. (2000) suggest that the most 
critical determinant of the effects of instability on team performance is the 
skill level of the operator leaving the team. “As a general rule,” they note, 
“there is little disruption of team performance from turnover, as long as only 
one team member is replaced at a time and that replacement is as skilled as 
the person he replaced” (p. 1060). 


Leadership Quality 


Leadership has been discussed previously in this chapter. Team leader qual- 
ity can affect team performance quality, particularly during critical situa- 
tions. The research is consistent in that team leaders in complex systems 
contribute to the climate in which the group operates, whether autocratic, 
democratic, or something in between. Leaders implement rewards and pun- 
ishments, and assign tasks. In these and other daily interactions, leadership 
quality affects team performance quality. Military organizations, where 
adherence to a superior’s orders is required, recognize that effective lead- 
ers elicit superior team member performance rather than compel it. As a 
result, leaders are encouraged to obtain voluntary cooperation from their 
subordinates rather than demand what can become reluctant or grudging 
cooperation. 

In commercial aviation early CRM programs addressed leadership quality 
as a critical element for successful interaction between superior and subor- 
dinate pilots. Good leaders attend to both operating tasks and subordinate 
concerns. Later CRM programs addressed additional elements of team per- 
formance and broadened the scope of team membership to include other 
operators in the system. 


Cultural Factors 


Cultural factors and their effects on system performance are discussed 
extensively in Chapter 8. Suffice to say that different cultures respect and 
defer to leaders, rules, procedures, and teams differently, and their values 
can affect the quality of operating team performance. 
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Case Study 


The relationship of operator team antecedents to errors can been seen in 
the collision of two commercial aircraft, a McDonnell Douglas DC-9 and a 
Boeing 727, in heavy fog at Detroit Metropolitan Airport in December 1990 
(National Transportation Safety Board, 1991). The DC-9 was destroyed and 
eight of the 44 people onboard were killed in the accident, although no one 
on the Boeing 727 was injured. During severely limited visual conditions, 
the DC-9 pilots mistakenly taxied their aircraft onto an active runway and 
into the path of the Boeing 727 as it was taking off. The Boeing 727 pilots 
were unable to see the DC-9 in time to prevent the collision. 

Heavy fog places substantial burdens on both pilots and controllers at air- 
ports that lack ground radar, as was the case at the Detroit airport at the time of 
the accident. If visibility is sufficiently limited, pilots are unable to see beyond 
a short distance in front of their airplanes, and they would be prohibited from 
taking off or landing. Conditions at the time of this accident approached, but 
did not exceed the visibility limits, but Detroit air traffic controllers were 
unable to see taxiing aircraft from the control tower, and pilots had difficulty 
establishing their positions at the airport. In these conditions—when planes 
could still operate but visibility is quite limited—both controllers and pilots 
rely on each other for airplane location information. Controllers depend on 
the pilots to inform them of their positions at the airport, and pilots depend 
on the controllers to separate them from other aircraft. 

The limited visibility added to the workload of both pilots and controllers. 
Controllers were unable to verify airplane positions and pilots lacked many 
of the visual cues needed to verify their positions on the airport. Markings 
that had been painted on runways and taxiways and served as guides to 
pilots were also not visible because a thin layer of snow had obscured them. 

The operator team on the DC-9 consisted of two pilots, a captain, the 
superior, who was making his first unsupervised air transport flight after 
a 6-year hiatus, following his recovery from a medical condition that was 
unrelated to his aviation duties, and a subordinate, the first officer. In the 
6-year interval between medical disqualification and his return to flying, the 
airline ownership had changed, and the airline that had employed him had 
been purchased by another airline. When he returned to flying the captain 
had to not only requalify to operate the DC-9 but learn his new employer’s 
operating procedures as well. 

The DC-9 first officer had retired from the U.S. Air Force, where he had 
been a pilot in command of large bomber aircraft. At the time of the acci- 
dent he was within his probationary first year period with the airline. The 
airline’s personnel rules allowed it to terminate the employment of pilots in 
their probationary periods without cause. In the 6 months since he joined 
the airline, he had flown into and out of Detroit 22 times, only one or two 
of them, according to his estimates, were in restricted visibility conditions. 
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Cockpit voice recorder data revealed that the first officer exaggerated attri- 
butes of his background to the captain. Unsolicited, he told the captain that 
he had retired from the air force at a rank that was higher than the rank that 
he had actually attained. He claimed to have experienced an event when fly- 
ing combat operations, before joining the airline, that he had not experienced. 
During the taxi from the gate he committed several errors, while ostensibly 
assisting the captain. Even when uncertain in the restricted visibility that 
was prevailing, he unhesitatingly gave their location to the captain, even 
though he was not certain of all the locations, and he continued to do so 
after misdirecting the captain on the airport surface, thereby requiring the 
air traffic controllers to give them a new taxi clearance to the active runway. 

Although in this type of airplane captains steer the airplane while taxi- 
ing, both pilots work together using airport charts and external visual infor- 
mation to verify that the taxi route they follow corresponds to the assigned 
route. Both also simultaneously monitor air traffic control communications 
for pertinent information, and they monitor the aircraft state. 

Early in their taxi the first officer told the captain, “Guess we turn left here.” 
The captain responded, “Left turn or right turn”? The first officer answered by 
describing what he believed to be their position. The captain then answered, 
“So a left turn,” and the first officer agreed. Ten seconds later he directed 
the captain, “Go that way” and the captain complied. This type of exchange 
between the captain and first officer continued for about 2 minutes, until the 
first officer, in response to an air traffic controller’s question about their loca- 
tion answered, “Okay, I think we might have missed oscar six,” the name of 
the taxiway to which they had been assigned. By misdirecting the captain to 
a wrong turn on the airport he had endangered the safety of the flight, but 
neither pilot had apparently recognized the significance of that error. 

Yet, even after being misdirected, the captain continued to accept the first 
officer’s guidance. For the next 5 minutes the captain continued to ask the 
first officer, “What’s he want us to do here,” “This a right turn here Jim,” and 
“When I cross this [runway] which way do I go, right”? and other similar 
questions. The first officer continued to direct the captain until they crossed 
onto the active runway and encroached upon the path of the departing 
Boeing 727. By the time of the accident, the first officer had provided all taxi 
instructions to the captain, which the captain followed without hesitation. 

The captain made a key operator team error by over-relying on the first 
officer. His error is understandable given his return to active flying after the 
long hiatus, and his likely belief in the first officer’s superior airport knowl- 
edge gained from more recent experience operating at Detroit. The first offi- 
cer’s seeming confidence in his knowledge of the airport routing appears 
to have exacerbated the captain’s preexisting tendency to rely on him while 
navigating on the Detroit airport surface. The interaction of an overly asser- 
tive subordinate, with a tendency to exaggerate his accomplishments and 
knowledge, albeit a tendency the captain could not have been aware of, and 
a superior relatively inexperienced in the circumstances that existed at the 
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time, combined to create unique operator team errors. Had the captain relied 
less on the first officer and been more attentive to information on their air- 
port position, the accident may have been avoided. 





Summary 


Complex systems often require operator teams, two or more operators work- 
ing toward a common goal or objective. They can enhance team performance 
by helping to prevent errors and mitigate the effects of errors that have been 
committed. Multiple operators can reduce individual workload, and assure 
that necessary tasks are performed during nonroutine or emergency situa- 
tions. Teams depend upon the effective work of each team member and good 
leadership qualities of the team leader. 

Operator teams in complex systems can also commit errors unique to 
teams. These include failing to notice or to respond to another's error, rely- 
ing excessively on others, incorrectly influencing the situation assessment of 
others, and failing to ensure that duties and responsibilities are delegated. 
Antecedents to these errors may lie within the culture or the company, or 
result from other factors specific to operator systems. 

Antecedents of error in both single-operator and operator team systems 
include those discussed in previous chapters, such as equipment design, 
operator factors, company and regulator factors, as well as several that are 
specific to operator teams. These include the shortcomings in the number of 
operators for the task, team structure, team stability, leadership quality, and 
cultural factors that can degrade team performance. 


DOCUMENTING ANTECEDENTS TO 
OPERATOR TEAM ERRORS 


GENERAL 


e Determine the critical errors that are believed to have led to the 
event and identify the team members who likely committed 
those errors. 


* Determine the number of tasks that operators attempted to 
perform, the amount of time available to perform those tasks, 
and the actions and decisions of each team member by inter- 
viewing operators, examining recorded data, and referring to 
operating manuals and other documents. 


* Document pertinent antecedents to single-operator type errors 
(such as those resulting from performance or procedural 
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deficiencies, discussed in previous chapters) and examine 
potentially relevant equipment, operator, and company factors 
if the error appears to be an operator type error. 


OPERATOR TEAM ANTECEDENTS 


* Document the number of operators called for and the number of 
operators involved in system operation at the time of the event. 


e Assess the adequacy of the number of operators available to 
perform the tasks in the allotted time. 


e Identify the duties of each team member and determine the 
extent to which each team member understood his or her 
duties, and performed them. 


e Determine the length of time that the team members had 
worked together as a team. 


* Describe communications among the team members, and 
supervisor/subordinate communications. 


e Examine the ease with which the equipment used enabled 
team members to recognize and become aware of the informa- 
tion received by their fellow team members and their actions 
with regard to the system. 


* Document company training, guidelines, and procedures that 
relate to team performance, and assess the extent to which 
these encourage team integration and team performance. 


e Assess the proportion of training, guidelines, and procedures 
devoted to team performance and the extent to which they call 
for team, as opposed to individual, operator tasks. 


* Document interpersonal skills of operator applicants and lead- 
ership skills of supervisor applicants by examining company 
selection criteria and history, and interviewing supervisors, 
subordinates, and colleagues. 

e Assess the extent to which the training, guidelines, and proce- 
dures pertain to team structure, team member responsibilities, 
and leadership qualities. 
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Electronic Data 











The telltale recorder, known as a Sensing and Diagnostic Module or 
S.D.M., was one of six million quietly put into various models of General 
Motors [G.M.] cars since 1990. A newly developed model being installed 
in hundreds of thousands of G.M. cars this year records not only the 
force of collisions and the air bag’s performance, but also captures five 
seconds of data before impact. It can determine, for example, whether 
the driver applied the brakes in the fifth second, third second or last sec- 
ond. It also records the last five seconds of vehicle speed, engine speed, 
gas pedal position and whether the driver was wearing a seat belt. 


Wald, 1999 
New York Times 


Devices that record system parameters are found in many complex systems, 
providing valuable investigative data. Traditionally associated with commer- 
cial aircraft, these devices are increasingly found in other systems, including 
railroad locomotives, marine vessels, and, as noted, automobiles. 





Types of Recorders 


In general, two types of devices have been used to capture data in complex 
systems, audio/video recorders and system-state recorders, although on 
occasion devices intended for other purposes may provide helpful informa- 
tion as well. Security cameras, for example, which are proliferating across 
many domains as their cost declines, can provide valuable data to both acci- 
dent investigators and criminal investigators. More recently, the decreasing 
cost of capturing and recording video data has enabled many companies 
who would otherwise not have done so to place video recorders in opera- 
tor consoles to capture images of operators during system operations. Each 
device, whether a camera or direct system recorder, collects information that 
could give investigators insight into the state of the operating system, its 
components and subsystems, the operating environment, as well as operator 
actions. 
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Audio/Video Recorders 


Because of their prominence in aircraft accident investigations, cockpit voice 
recorders, often referred to as “black boxes,” are among the most well-known 
recorders that investigators use. These record aircraft cockpit sounds in a 
flight’s last 2 hours of operation. 

Audio recorders are also found in other systems. Air traffic control facili- 
ties record communication between pilots and controllers, and electronically 
transmitted voice communications among controllers. Marine vessel traffic 
centers capture communications between vessel operators and ground sta- 
tion personnel, and railroad control facilities record voice communications 
between dispatchers and train crews. 

Video recorders are not extensively used in complex systems at present, pri- 
marily because of both technical and legal reasons. Until recently, the costs 
of rewiring systems to employ video recorders and store the recorded data 
were prohibitive, and the size of recording equipment interfered with sys- 
tem operations. However, technical improvements have lessened the scope 
of these shortcomings and video records of system operations are becoming 
increasingly available to investigators. 

The use of video recorders in accident investigations has also raised legal 
issues that have limited their use (Fenwick, 1999). Concerns such as operator 
privacy, post-event litigation, and unauthorized release of video data have 
yet to be resolved, and many potential users are reluctant to use them until 
these issues are resolved to their satisfaction. However, many investigators 
have called for the installation and use of video recorders to enhance safety 
(e.g., National Transportation Safety Board, 2000). As these calls increase 
and as technical advances continue, video recorder use in complex systems 
will almost certainly increase. Video recordings, whether still or motion, can 
provide critical information on the actions of the operator before the acci- 
dent. Investigators can use this information to determine how closely those 
actions matched other information about the accident, and whether the oper- 
ator performed as appropriate for the particular circumstances at the time. 


System-State Recorders 


System-state recorders are found in many systems. In air transport aircraft 
they continuously record several hundred flight parameters over a 25-hour 
period. The International Maritime Organization has required internation- 
ally operating marine vessels to be equipped with voyage data recorders, 
devices that record the ship’s position, speed, heading, echo sounder, main 
alarms, rudder order and response, hull stresses, and wind speed and 
direction, for 12 continuous hours (Brown, 1999). The U.S. Federal Railroad 
Administration requires trains that can exceed 30 miles per hour be equipped 
with event recorders that record speed, direction, time, distance, throttle 
position, brake application, and, in some cases cab signals, for 48 continuous 
hours (Dobranetski and Case, 1999). 
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The value of the data that system recorders capture was evident in the 
investigation of a 1997 passenger train derailment. The accident occurred 
after a flash flood had weakened the underlying support of a bridge that 
the train was traversing (National Transportation Safety Board, 1998a). 
According, to investigators, 


All four locomotive units were equipped with GE Integrated Function 
computer event recorders... The data from the lead locomotive indi- 
cate that the train was traveling approximately 89 to 90 mph, with the 
throttle in position 3 (with a change to 4 and then 1), when the brake 
pipe pressure decreased from approximately 110 to 0 psi, and the emer- 
gency parameter changed from NONE to TLEM [Train Line Induced 
Emergency]. Within the next 2 seconds, the pneumatic control switch 
(PCS) parameter changed from CLOSED to OPEN. Between 2 and 4 sec- 
onds after the PCS OPEN indication, the position of the air brake han- 
dle changed from RELEASED to EMERGENCY, and the EIE [Engineer 
Induced Emergency] parameter changed from OFF to ON. (p. 39) 


Investigators recognized from these data that the emergency brakes had 
been applied before the derailment, information that was critical to under- 
standing the engineer’s performance. Thus the engineer had attempted to 
stop the train before the derailment, but was unable to do so in time. 

Recorders need not necessarily be physically located within a system to 
capture data. For example, large airports are equipped with detectors that 
record weather data such as ceiling level, visibility, wind direction and 
velocity, barometric setting, and precipitation amount and duration. Some 
electrical generating facility smokestacks are equipped with detectors that 
measure and record wind direction and velocity, and some bridges have the 
ability to capture and record the water levels underneath them. 


Other Electronic Data 


Investigators can often obtain recorded data from a variety of sources, some 
of which may have been implemented for purposes other than accident 
investigation. For example, government agencies, companies, and individu- 
als place security cameras in and around buildings, equipment, yards, and 
other facilities, equipment that could provide information on the actions of 
critical people, as well as changes in lighting, weather, and equipment condi- 
tion. Computers, smart phones or other data storage devices that operators, 
supervisors, and others use may also contain valuable data. 

The value of data from these recorders that were not part of the system 
was apparent in the investigation of a September 1989 accident involving 
a DHC-6, an airplane that was not equipped with recorders at the time 
(National Transportation Safety Board, 1991). Eight passengers and the two 
pilots were killed in the accident. Investigators obtained a video recording 
from a passenger video camera used during the flight. Because there was 
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no barrier between the airplane’s cockpit and the cabin, passengers had an 
unobstructed view of the pilots. The video showed the pilots’ arm and hand 
movements during the accident sequence, information that demonstrated 
that they had difficulty controlling the airplane during the landing, and had 
attempted to stop the landing to try again. However, in attempting to reject 
the landing each pilot tried to operate the same controls at the same time. 
Their arm motions interfered with each other’s actions, and they rapidly lost 
control of the airplane. The information was invaluable; without it investiga- 
tors would have had substantial difficulty determining the accident’s cause. 

Investigators of a marine accident made a particularly innovative use of 
security camera recordings to determine the angle at which the vessel heeled, 
or turned on its longitudinal axis, during a turn after the operator mistak- 
enly turned the vessel in a series of increasingly greater turns to counter 
what had initially been mildly excessive steering commands by the vessel’s 
integrated navigation system (National Transportation Safety Board, 2008). 
Because of limitations in the instrumentation used to measure and record 
certain data, the vessel’s voyage data recorder showed the vessel heeling to 
15°, the maximum angle the recorder would read. By contrast evidence from 
passenger injuries and damage to objects throughout the vessel suggested 
a greater heeling angle. Video cameras, for example, recorded passengers 
being thrown out of the pool as the pool water moved in increasingly greater 
motion, consistent with increasing heel angles. 

As can be seen in Figure 10.1, by noting the time stamped onto the image 
of a security camera that captured part of the external side of the vessel, and 
noting the sun’s angle on the horizon in the photo, and measuring the differ- 
ence between the “angle of the shadow created by the vessel on a reference 
point on the images and the angle that would have been created by the ship’s 





FIGURE 10.1 
External security camera photograph of Crown Princess used to determine vessel heel angle. 
(Courtesy of the National Transportation Safety Board.) 
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orientation to the sun at that time, given the sun’s angle over the horizon and 
the ship's orientation” (p. 20), investigators determined that the actual heel- 
ing angle was 24°, an angle well-beyond what cruise vessel passengers could 
reasonably expect to encounter on a vacation cruise. By using data from a 
security camera, investigators were able to determine the actual vessel heel- 
ing angle more accurately than was measured by the instrument installed on 
the vessel to do so. As they described in the text accompanying Figure 10.1, 


Image of the Crown Princess taken by a ship’s video camera at the maximum 
angle of heel, with reference lines added by investigators. Stamped time 
corresponds to 1525:02 eastern daylight time. The apparent bending of the 
horizon is an artifact of the wide-angle camera lens, which causes straight 
lines to appear curved and bow outward from the image center. (p. 21) 





System Recorder Information 
Audio Recorders 


Audio recorders can provide real-time information on both the operator and 
the equipment. 

The operator. Audio recordings reveal operators’ verbal interactions in oper- 
ator teams. For example, in airline operations pilots perform procedures in 
strict order, established on checklists that are specific to the different operat- 
ing phases. Generally, one pilot identifies the checklist item and the other 
performs and articulates the action taken in response, or describes the sta- 
tus of a particular component. Recordings of pilot statements or comments 
can help investigators determine whether they completed the required tasks, 
and the sequence in which they performed the tasks. 

This information was particularly helpful in the investigation of an August 
1987 MD-80 accident in Detroit, Michigan (National Transportation Safety 
Board, 1988). The pilots were following the checklist while they taxied the 
airplane from the terminal to the runway. The checklist included a step that 
called for one pilot to extend the flaps and slats for takeoff, and the other 
pilot to verify that this had been accomplished, a critical action because tak- 
ing off with the flaps and slats retracted jeopardize the safety of flight. 

Audio recorder data revealed that the pilots’ checklist review was inter- 
rupted when an air traffic controller requested information from them. The 
pilots responded to the controller and resumed the checklist tasks but at 
the wrong checklist location, inadvertently omitting several required steps, 
including verifying the flap and slat extension. They attempted to takeoff 
but the airplane was unable to climb. It crashed shortly after the start of 
the takeoff, killing all but one of the more than 150 passengers and crew 
onboard. Cockpit voice recorder data enabled investigators to learn not only 
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the nature of the operators’ error, but the context in which they committed 
the error as well, giving investigators a fairly comprehensive perspective on 
the pilots’ error. 

Audio recorder information can also complement other operator-related 
data. For example, after an extensive inquiry, investigators of a September 1994 
accident involving a Boeing 737 that crashed near Pittsburgh, Pennsylvania, 
determined that the rudder had abruptly moved to one side just before the 
accident (National Transportation Safety Board, 1999). This caused the airplane 
to turn left and dive abruptly to the ground. Investigators had to determine 
whether the airplane's turn had been initiated by pilot action or by the rudder 
itself, because the flight recorder data showed the turn but not its source. 

Investigators used different techniques to understand the cause of the turn. 
They analyzed sounds that the pilots made during the accident sequence 
and compared them to the sounds that they had made during routine por- 
tions of the flight. By examining elements of pilot sounds, including voice 
pitch, amplitude, speaking rate, and breathing patterns, investigators deter- 
mined that, 


The first officer emitted straining and grunting sounds early in the upset 
period, which speech and communication experts stated were consis- 
tent with applying substantial physical loads; the CVR [cockpit voice 
recorder] did not record any such sounds on the captain’s microphone 
channel until just before ground impact. After about 1903:18 (about 5 
seconds before ground impact) ... the captain’s breathing and speed pat- 
terns recorded by the CVR indicated that he might have been exerting 
strong force on the controls. (pp. 247-248) 


These sounds, with other information, convinced investigators that the 
pilots did not initiate the turn and subsequent dive. The straining and grunt- 
ing sounds heard on the recording were characteristic of those made during 
utmost physical exertion. Pilots would make these sounds when forcefully 
attempting to counteract a maneuver, not when initiating one that would 
have taken little physical effort. 

Audio recorder data can also reveal operators’ perceptions of the events 
they are encountering, giving investigators critical information about their 
decision making. For example, in the January 1982 accident of a Boeing 737 
that crashed in Washington, D.C., cockpit voice recorder information showed 
that neither pilot understood the meaning of engine performance display 
data (National Transportation Safety Board, 1982). 

The flight had been delayed during a snowstorm. To speed their departure 
from the gate, the captain inappropriately applied reverse thrust, designed to 
redirect jet engine thrust forward to slow the airplane on landing. On some 
aircraft, reverse thrust was permitted for exiting the gate area. However, on 
airplanes with wing-mounted engines, such as the Boeing 737, the engines 
are close to the ground and the use of reverse thrust in the terminal area 
could redirect debris into the front of the engines and damage them. 
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The pilot's use of reverse thrust at the gate caused snow and ice to block 
critical engine probes in the front of the engines, invalidating the data that 
several of the five engine displays presented. Four gauges, which displayed 
data from internal engine functions, presented accurate information. With 
two engines on the Boeing 737, five engine-related displays were presented in 
each of two columns, one for each engine, a total of 10 gauges in all. The two 
accurate displays were located in the topmost of the five rows (Figure 10.2), 
the ones that measured engine RPM and were not dependent on the blocked 
engine probes, as were the gauges in the remaining four rows. The analog 
gauges presented conflicting information that digital displays in modern 
aircraft would likely display as well (because of regulator requirements on 
engine data to be displayed), but with associated warnings on other displays 
indicating the discrepancies between the presentations. 

After he applied takeoff thrust, the first officer recognized that the engine 
instruments were providing unexpected information. Yet, neither pilot 
could understand the nature of the unfamiliar data, or the significance of 
the presented information, a diagnosis they attempted to achieve while 
the airplane was rolling for takeoff and thus, just seconds before a deci- 
sion needed to be made as to whether to continue the takeoff or stop the 
airplane on the remaining runway while it was still safe to do so. Neither 
pilot appeared to have encountered the data previously, either in training or 
during an actual flight. The first officer asked the captain, “That don't seem 
right, does it?” Three seconds later he again said, “Ah that’s not right.” The 
captain responded, “Yes it is, there's eighty [knots].” Almost immediately, 
the first officer answered, “Nah, I dont think that’s right.” Nine seconds later 
he again expressed uncertainty, “Ah, maybe it is,” he said. Four seconds later, 
after the captain declared that the airplane’s speed had reached 120 knots, 
the first officer said simply, “I don’t know.” They continued with the takeoff, 
and the accident occurred 38 seconds later. 

The pilots’ comments, with other data, showed investigators that 


e The pilots were unable to interpret the displayed engine data in the 
time available to make an informed go/no-go takeoff decision 


* The captain misinterpreted the displayed data 
* The first officer was uneasy with the captain's interpretation, and 
e He nevertheless acceded to it 


This information allowed investigators to understand the nature of the 
crew's decision making and suggest strategies to improve pilot performance 
in similar situations. 

Recorded audio data can also allow investigators to compare changes in 
operator vocalizations, potentially revealing much about operator perfor- 
mance. For example, investigators of the 1989 grounding of the oil tanker 
Exxon Valdez, in Alaska's Prince William Sound, compared changes in the 
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FIGURE 10.2 

Engine data for 737-200, resembling displays on accident aircraft. Displays on 2nd row from 
the top point to the 10:00 position, all others to the 8:00 position. (Courtesy of the National 
Transportation Safety Board.) 


tanker master’s voice during communications with the U.S. Coast Guard's 
Port of Valdez vessel traffic center 33 hours before, 1 hour before, immediately 
after, 1 hour after, and 9 hours after the grounding (National Transportation 
Safety Board, 1990). His speech rate significantly slowed, and other vocal 
characteristics, such as articulation errors, were found that were consistent 
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with the effects of alcohol consumption. With other evidence, the recorded 
audio information supported investigators’ conclusions that the master was 
impaired at the time of the grounding, and that his alcohol-related impair- 
ment contributed to the accident. 


The Equipment 


Recorded data can disclose critical features of aurally presented information 
such as alerts, their sound characteristics, time of onset and of cessation, and 
operator statements in response to these sounds. Investigators used this infor- 
mation in their investigation of a 1996 Houston, Texas, accident in which the 
pilots of a DC-9 failed to extend the landing gear before landing, causing sub- 
stantial damages to the airplane (National Transportation Safety Board, 1997a). 

Airplanes are required to have alerts that sound if the pilots do not extend 
the landing gear before landing. Investigators sought to determine whether 
the warnings alerted, and if so, the nature of the pilots’ response. Cockpit 
voice recorder data revealed that the pilots had omitted a critical step on 
the pre-landing checklist, which called for one pilot to engage the hydraulic 
system, the mechanism that powers the flaps and the landing gear, and the 
other to verify that the system had been engaged. However, because they 
had omitted this step and did not engage the hydraulic system, they were 
unable to extend the flaps and landing gear. Although they knew that they 
could not lower the flaps, the cockpit voice recorder indicated that they did 
not realize that they had not extended the landing gear. 

Cockpit voice recorder information revealed that an audible alert, indicat- 
ing a retracted landing gear, sounded before landing. Concurrently another 
more prominent alert, the ground proximity warning system alert, was also 
heard. The simultaneous sound of the two alerts (a result of a single phe- 
nomenon, the retracted gear), interfered with the pilots’ ability to determine 
the cause of the alerts. Instead, they focused on maintaining a safe landing 
profile and did not recognize that the gear had not been extended. 

Audio recorders may, on occasion, document information that they had 
not been designed to capture. For example, in the Washington, DC, Boeing 
737 accident discussed previously, the cockpit voice recorder recorded 
changes in the engine pitch, corresponding to increases in engine thrust for 
the takeoff. Investigators analyzed these sounds to measure the approximate 
amount of thrust that the engines generated; a parameter that flight data 
recorders capture today but did not at that time. 

The analysis showed that the amount of thrust actually generated was con- 
siderably less than the amount the pilots had attempted to establish, and less 
than what they believed the engines had been generating. As shown in Figure 
10.2, the amount actually generated was consistent with data that the top two 
rows of the 10 engine-related gauges displayed, the gauges on which the cap- 
tain was primarily focusing, but inconsistent with data that the other gauges 
displayed. The discrepancy between the amount of engine thrust actually 
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generated and the amount the pilots expected proved critical to understand- 
ing the accident. The thrust actually generated was insufficient to overcome 
other adverse weather-related characteristics of the flight. However, because 
the pilots focused primarily on the two gauges that displayed the engine 
RPMs, believing that it would indirectly show the amount of thrust actually 
generated, they were unable to understand or resolve the discrepancy. 


System-State Recorders 


System-state recorders can provide data captured in the period leading up to 
and through the event that give extraordinary insights into operator actions 
and system responses. In railroad operations for example, event recorder 
data describe several aspects of system responses to engineer actions, data 
that alone would be quite valuable, but when combined with other data, 
such as obstructions to visibility, track curvature, grade, and bank angle, the 
information could enable investigators to identify antecedents to operator 
errors and understand their effects on operator performance. 

The value of system-state recorder information was evident in the inves- 
tigation of the January 1997 crash of an Embraer Brasilia, in Michigan 
(National Transportation Safety Board, 1998b). The flight data recorder cap- 
tured 99 parameters of airplane performance, information that, combined 
with other recorded data from the cockpit voice recorder, air traffic control 
radar and communications, and meteorological sources, gave investigators a 
comprehensive understanding of the state of the airplane and of the operator 
actions up to the accident. 

These data showed that the pilots had slowed the airplane to a speed that 
would ordinarily have been acceptable for safe flight. However, analysis of 
the airplane's flight path suggested that it had likely passed through an area 
of icing just before the accident, leading to ice accumulation that investiga- 
tors determined was likely imperceptible. Its flight characteristics were con- 
sistent with those of an aircraft adversely affected by ice accumulation on its 
wings. Although the airspeed would otherwise have been adequate, the ice 
accretion caused a control loss at the particular airspeed that the pilots had 
established in those meteorological conditions because ice contamination on 
an airplane’s airfoil or wing increases the speed at which the airplane will 
stall. What had been an acceptable speed for safe flight became an insuffi- 
cient speed because of the ice contamination. 


Integrating Information from Multiple Recorders 


When combining data recorded in different recorders, applying a common 
standard or metric to align and match the data helps to clarify the often 
diverse information. Because many recorders capture elapsed time, the use 
of a standard time common to the various recorded data allows investiga- 
tors to compare data from multiple recorders. For example, this process 
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allows one to compare operator statements obtained from audio recorders 
to parameters obtained from system recorders, to assess changes in operator 
statements that may relate to changes in other system features. 

Investigators of the May 1996, DC-9 accident in the Florida Everglades, 
plotted the data from the flight data recorders, cockpit voice recorder, and 
air traffic control radar on one diagram to create a three-dimensional plot of 
the airplane's flight path, displayed in Figure 10.3 (National Transportation 
Safety Board, 1997b). By comparing data from the three sources of recorded 
information, investigators found that, 


The flight was normal until 1410:03, when an unidentified sound was 
recorded on the cockpit voice recorder. At 1412:58, after about 30 sec- 
onds at 7400 feet msl altitude with a gradual heading change to 192°, the 
radar indicates an increasing turn rate from the southerly direction to 
the east and a large increase in the rate of descent. Flight 592 descended 
6,400 feet (from 7400 feet to 1,000 feet) in 32 seconds. Computations of 
airspeed, based on radar data, indicate that the airspeed of flight 592 was 
more than 400 KIAS and increasing at the time of ground impact, which 
occurred about 1413:40. (pp. 55, 58) 


The constructed plot, Figure 10.3, shows the flight path, altitude, and posi- 
tion of the airplane, selected background sounds, and pilot statements to 
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FIGURE 10.3 
Ground track of ValuJet flight 591, using information from air traffic control radar, and cockpit 
voice and flight data recorders. (Courtesy of the National Transportation Safety Board.) 
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air traffic controllers. The plot presents a readily interpretable picture of the 
airplane’s path, as the pilots gave controllers increasingly urgent accounts of 
the smoke and fire in the airplane. 





Assessing the Value of Recorded Data 
Audio Recorder Data 


The quality of the recording components and the level of ambient noise 
affect the quality of audio recorder data. Defects in microphones, recording 
media, and drive speed, can degrade the recorded sound quality and detract 
from the ability to identify and interpret the sounds. The distance of the 
microphone from the operators or system sounds also affects sound quality, 
unless microphones that are designed to detect sounds at great distances are 
used. In general, the greater the distance between the microphone and the 
sounds that are being recorded, the lower the quality of the recorded data. 


System-State Recorder Data 


The quality of system-state recorders, though relatively immune to many 
of the features that could degrade audio recorder data, is primarily influ- 
enced by two factors, the frequency with which the data are sampled, and 
the number of system parameters that are recorded. Because at any one point 
complex systems measure a potentially unlimited number of parameters, the 
more parameters that are sampled and recorded, the more comprehensive 
the subsequent portrait of the system. Recorders that capture hundreds of 
system parameters provide a more comprehensive, and hence more valu- 
able, description of the system and its operating environment than those that 
record only a few parameters. 

Similarly, because of the dynamic nature of many complex systems, the 
more often system recorders obtain and record the data, the more accurate 
the view of the system that is obtained. A device that captures data every 
second gives a more complete, and hence more accurate, account of system 
operations than one that captures data every third second. 





Summary 


Many systems are equipped with equipment that records critical informa- 
tion about system equipment, components, operating environments, and 
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operator actions. Because of technological innovations and other factors, 
relatively inexpensive video cameras are often found in or near accident 
sites, thus potentially providing valuable information about a system in the 
moments before an accident. Images from video recordings can be applied 
in innovative ways to enhance investigators’ understanding of accident 
causation. Audio recorders chronicle operator comments and other sounds 
heard in operating stations, and system-state recorders record key system 
parameters. 


INTERPRETING RECORDED DATA 


e Match pertinent data against a common metric, such as elapsed 
time, local time, or Universal Coordinated Time, also referred 
to as Greenwich Mean Time, when examining data derived 
from multiple recorders. 


* Select the parameters that best reflect the overall system state, 
the components of greatest benefit to the issues of the investi- 
gation, and that offer the most information on operator deci- 
sions and actions, if considerable recorded data are available. 


* Develop multiple data plots, or use multiple time intervals as 
the period of interest increases or decreases, when numerous 
system-state parameters have been recorded. 


* Determine an appropriate interval to be used when examining 
recorded data, taking into account the number of parameters, 
the proximity to the event, and the number of changes that the 
system is undergoing at the time. 


NENNEN  — — — — 
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Interviews 











Did I see DiMaggio famously kick the dirt as he reached second, a 
moment replayed on countless television biographies of him because it 
was the rarest display of public emotion on his part? Again, I think I did. 
Who knows? Memory is often less about truth than about what we want 
it to be. 


Halberstam, 2000 
New York Times 


Interviews can provide information unobtainable from other sources and 
give investigators unique insights into critical operator errors. Their value 
is undeniable and nearly all investigations rely heavily on interviews. Yet 
many investigators conduct interviews poorly, largely because they do not 
recognize that interviewing, like other aspects of investigations, calls for 
unique skills. Experienced interviewers understand that the conduct of an 
interview can affect its outcome. No information collected in an investiga- 
tion is as susceptible to variations in investigative technique as are interview 
data. The interviewees selected, the questions asked, and the interviewing 
methods used, are just some of the elements that can affect interview qual- 
ity. This chapter will examine interview quality and discuss methods to 
enhance the quality and quantity of interview information. 





Memory Errors 


To understand interviewing skills, it is necessary to first understand how 
memory functions and what influences memory errors. Research has shown 
that people do not passively receive and record information as memories, 
rather they actively reconstruct memories (Buckhout, 1974; Haber and Haber, 
2000). As researchers have described it, 


In essence, all memory is false to some degree. Memory is inherently 
a reconstructive process, whereby we piece together the past to form a 
coherent narrative that becomes our autobiography. In the process of 
reconstructing the past, we color and shape our life’s experiences based 
on what we know about the world. Our job as memory researchers and 
as human beings is to determine the portion of memory that reflects 
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reality and the portion that reflects inference and bias. This is no sim- 
ple feat, but one worthy of our continued investigation. (Bernstein and 
Loftus, 2009, p. 373) 


Memory reconstructions and hence memories are influenced by people’s 
experiences, attitudes, motives, and beliefs. Because of differences among 
these influences and other differences those experiencing the same event 
may have different memories of it. This can be especially true for dynamic 
events, as Buckhout (1974) explains, describing various influences on the 
accuracy of recollections of dynamic events, 


The length of the period of observation obviously limits the number of 
features a person can attend to. Yet, fleeting glimpses are common in 
eyewitness accounts, particularly in fast-moving situations. Less than 
ideal observation conditions usually apply; crimes [and incidents and 
accidents] seldom occur in a well-controlled laboratory. Often distance, 
poor lighting, fast movement or the presence of a crowd interferes with 
the efficient working of the attention process. (p. 25) 


Errors in the recall of dynamic events were evident in the information 
that over 670 eyewitnesses gave investigators in the investigation of the 1996 
in-flight explosion of a Boeing 747 off the coast of Long Island, New York 
(National Transportation Safety Board, 2000). Over 250 of the eyewitnesses 
described aspects of the event that were directly contradicted by the physical 
evidence; they claimed to have seen a streak of light or flame ascend to the 
airplane. However, the physical evidence was unequivocal; flames fell from 
the airplane, not the other way around. 

Hyman (1999) describes three categories of memory errors that people can 
make: incorrectly reconstructing event recollections, incorrectly attributing 
the source of information, and falsely believing that events that were not 
experienced had been experienced. To Hyman, memory errors occur because 
people “view the event as plausible, they construct a memory that is partially 
based on true experience and that is often very vivid, and they erroneously 
claim the false memory as a personal recollection” (p. 247). In other words, 
people may reconstruct memories by applying information from their previ- 
ous experiences to fill in gaps, or to help explain phenomena they observed 
that were difficult to explain. 

Interviewers can also influence interviewee recollection and response. 
Loftus (1997) showed that interviewers can subtly insinuate false information 
into their questions, which interviewees may then believe as facts witnessed 
or experienced. Wells, Malpass, Lindsay, Fisher, Turtle, and Fulero (2000), 
summarizing the literature on eyewitness recollection errors, conclude that, 


The scientific proof is compelling that eyewitnesses will make system- 
atic errors in their reports as a function of misleading questions. From a 
system-variable perspective, it matters little whether this effect is a result 
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of introducing new memories or altering old memories or whether this 
is a compliance phenomenon. The important point is that witnesses will 
extract and incorporate new information after they witnessed an event 
and then testify about that information as though they had actually wit- 
nessed it. (p. 582) 


Nevertheless, despite the potential for memory errors, witnesses can pro- 
vide valuable information. Knowing the influences on memory, it is incum- 
bent on interviewers to recognize the substantial influence they can exert 
on the quality of the interview itself and take steps to enhance interviewee 
recall and reduce opportunities for memory errors. Interviewers should 
begin to do this even before they begin the interview. 





Interviewees and Their Concerns 


Some interviewees have much at stake in the findings of an investigation 
while others have little or nothing at stake. Differences in background, 
experience, and education can also affect people's ability to understand and 
respond to questions, and their willingness to assist an investigation. 
Interviewees in accident and incident investigations generally fall into one 
of three groups, according to their relationship with the operator or their role 
in the event. These include those who have observed the event, operators 
whose actions are the primary focus of the investigation, and those who are 
familiar with critical system elements but may not have been directly involved 
in operating the system at the time of the accident. Each may have important 
information to contribute, according to his or her knowledge and insights. 


Eyewitnesses 


Eyewitnesses to an event may have observed features that system record- 
ers did not capture, heard noises beyond the microphone range, smelled 
odors associated with certain phenomena, or felt movement that no device 
recorded. Their observations and experiences may enhance or confirm exist- 
ing information and add to information unavailable from other sources. 
Eyewitness willingness to cooperate with investigators will likely be influ- 
enced by their confidence in the value of the information they can provide the 
investigation, and possible concern with the interviewer's approval of their 
responses. They will likely offer information more readily if they believe both 
that it will help the investigation and that interviewers will appreciate their 
cooperation. Thus, it is incumbent on those interviewing eyewitnesses to unam- 
biguously describe the potential contribution of the eyewitness to the investiga- 
tion, and convey the investigators’ appreciation to them for their cooperation. 
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System Operators 


Operators may be able to describe their actions and decisions during the event 
and provide helpful background information about the system. However, if 
the event that occurred was dynamic, they may be unable to recall details, 
and if they feel responsible for the event, they may have difficulty respond- 
ing. Nonetheless, at a minimum, they should be able and willing to describe 
their experiences in system operations and thus offer insights regarding sys- 
tem design and company policies and practices. Investigators should rec- 
ognize that operators may be concerned about the effects of the event on 
their careers. The accident being investigated may be the most challenging 
and difficult event they have encountered in their professional experience. 
If operators’ feelings of personal responsibility or career concerns adversely 
affect their ability to recall events, interviewers can do little other than delay 
the interview until the operators are sufficiently composed and distant from 
the event to effectively answer questions. 


Those Familiar with Operators and Critical System Elements 


Those at the system’s “blunt end” in Reason’s terms are familiar with criti- 
cal elements of the system. They serve as the equipment designers, training 
managers, procedures specialists, financial managers, or direct supervisors. 
Their decisions and actions may have set in motion the circumstances that 
led to the errors of the operators at the “sharp end.” 

Those close to the operators, such as family and coworkers, may share 
many of their concerns. Depending on the information and the availability 
of the operators, investigators should obtain from them details about recent 
operator experiences such as sleep and work habits, and recently encoun- 
tered stressful events. This information may give investigators better insight 
into the event than they might otherwise obtain. 

Operators or those familiar with them may also feel responsible for the cause 
of the event. For example, a maintenance supervisor may believe that he or she 
was responsible for the errors of the technicians whom he supervised, or for 
the quality of the procedures that he followed. Unfortunately, little can be said 
to address these concerns. These interviewees could describe their decisions 
and the actions they took that may have influenced the nature of the event. 





Information Sought 


The information sought in an investigation varies according to the role of the 
interviewee in the event being investigated, whether eyewitness, operator, or 
someone familiar with critical system elements. 
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Eyewitnesses 


Eyewitnesses should be asked to describe 


What they saw, heard, felt, and smelled 

Details of the event that first caught their attention 

Time of day they witnessed the event 

Their own location and activities during the event 
Operator actions 

The names and locations of other eyewitnesses if known 


Additional information they believe relevant 


Operators 


Operators should be asked to provide information about the event, and addi- 
tional personal and company-related information, including, 
Event-related actions and decisions 


Decisions they made before the event 

Approximate time when they made those decisions 
Actions they took before the event 

Approximate time when they took those actions 
Outcome and consequences of each 


Job/task information 


Their job/task duties and responsibilities in general 

Knowledge requirements of the job/task 

System operating phases and their approximate time intervals 
Their responsibilities, activities, and workload during each operat- 
ing phase 

Abnormal situations and the frequency with which they have been 
encountered 


Their responses to abnormal situations 


Company practices and procedures 


General system operating practices and procedures 
Task specific practices and procedures 


Differences between company intent and actual practice in system 
procedures 
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Personal information 


* Overall health and recent illnesses, physician visits, and 
hospitalizations 


* Major changes in family and/or job status 


* Medications taken within previous 30 days, including prescribed 
and over the counter medications, and herbal supplements 


* Sleep schedule previous 72 hours (longer if they can recall) 
e Activities previous 72 hours (longer if they recall) 


Those Familiar with Critical System Elements 


Those who are acquainted with the system operators on duty at the time of 
the accident should be asked for information that is unavailable from other 
sources, or that adds to existing information about the event. In general, the 
operator's close relatives or colleagues should be asked about the operator's 


* Sleep and rest schedule previous 72 hours 
* Activities previous 72 hours 
* Opinions expressed toward the job, coworkers, and the company 


Those who are familiar with the system should be asked about the follow- 
ing, according to their expertise and role in system operation 


* Operator training and work history 

* Training program history and description 
* Operating policies and practices 

* General company policies and practices 





The Cognitive Interview 


Regardless of the interviewee, interviewers should always ask questions 
in a way that enhances interviewees' ability to provide the maximum 
information possible. In the mid-1990s, after a number of people convicted 
of primarily sexual assault crimes, largely on the basis of eyewitness evi- 
dence, were found to have been wrongly convicted after DNA evidence 
later confirmed their innocence (Loftus, 2013), research into how eyewit- 
nesses misidentified suspects was initiated. The studies showed that in 
most cases the witnesses fully believed that the individuals whom they 
had identified were the ones who committed the crimes of which they 
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were accused, but largely as a result of the manner in which law enforce- 
ment officials had interviewed them they had come to identify the wrong 
individuals. 

Around the same time adult women claimed that close relatives had sexu- 
ally abused them when they were children, but, because of the psychologi- 
cal trauma associated with the assaults, their memories of those assaults 
were repressed until adulthood. However, as Bernstein and Loftus (2009, p. 
372) observed, “many cases of allegedly recovered memories have turned 
out to be false memories implanted by well-meaning therapists who use 
suggestion and imagination to guide the search for memories.” These often 
high-profile instances led to research that showed that events that had never 
occurred could be “planted” into peoples’ memories, so that the people 
firmly believed that the events that they described had in fact occurred. The 
research also illustrated how memories of events are reconstructed, and 
that people may, without recognizing it, fill in “gaps” in their observations 
of events to make “complete” memories. Similar research into interview- 
ing showed that interviewer suggestibility and manner of asking questions 
affect the responses that interviewees provide. Today, as a result of research 
into both memory development and interviewing, a number of interview- 
ing techniques have been developed to enhance interviewee recollection. 
Foremost among them is the cognitive interview, which as researchers 
describe, 


...represents the alliance of two fields of study: communication and 
cognition. The social-psychological concerns of managing a face-to-face 
interaction and communicating effectively with a witness were inte- 
grated with what psychologists knew about the way people remember 
things. (Wells, Memon, and Penrod, 2006, p. 55) 


Cognitive interviewing assumes that interviewees want to provide maxi- 
mum information to interviewers, and by conducting interviews in the 
appropriate environment with effective techniques, interviewers can maxi- 
mize the information they obtain in interviews. 


Rapport 


A critical element of cognitive interviewing calls on the interviewer to 
establish and maintain rapport with the interviewee. Research has dem- 
onstrated that when interviewers establish rapport with interviewees the 
quality and quantity of information the interviewees provide increases 
(Collins, Lincoln, and Frank, 2002; Kieckhaefer, Vallano, and Compo, 2014). 
Interviewer-interviewee rapport can lessen interviewee anxiety and provide 
a personal connection between the two so that the interviewee will want to 
assist the interviewer. Interviewers need to establish and maintain rapport 
with all interviewees, even operators suspected of causing major accidents. 
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A disapproving tone, lack of concern, and subtle expressions of disapproval 
can quickly make the interviewee reluctant to provide additional informa- 
tion to the interviewer. By contrast, concern for the interviewee's well-being, 
a neutral tone, close attention to the interviewee can enhance rapport and 
hence, the amount of information provided. 

Geiselman and Fisher (2014) recommend that before beginning interviews, 
interviewers exchange pleasantries with interviewees by asking them what 
they do on a typical day, describe family activities, and so on. Additional 
techniques include explaining to the interviewee the role of interviewee 
information in the investigation, how the information will assist investi- 
gators, and attending to interviewee needs and concerns throughout the 
interview. 

Because of the importance of personal contact in establishing rapport, 
particularly when interviewing operators and others knowledgeable about 
the system, interviewers should conduct face-to-face interviews when pos- 
sible rather than by electronic means such as video and/or audio interac- 
tions, available through various telephone, software, and smart phone apps. 
However, practicality of investigation is also important and thus if inter- 
viewers need to question numerous eyewitnesses and sufficient interviewers 
are not available, face-to-face interviews with the eyewitnesses may not be 
practical. In that case, conduct interviews by other electronic means such as 
through computer-based video or audio. 


Asking Questions 


The cognitive interview assumes that interviewees maintain memories of 
the events they are asked to describe, and that the best way interviewees 
can access those memories is to allow them to follow their own memories 
rather than the interviewer attempting to direct the interviewee to do so. 
That can be done most readily by asking the interviewee open-ended ques- 
tions, and then, based on the responses, following up with more specific 
ones. For example, good opening questions include, “tell me what happened 
when you first realized that something was wrong,” “what did you see when 
the train approached the curve,” and “what happened as you began to slow 
down.” Follow up questions to these open-ended ones can be of the type, 
“what did you do next,” “what happened after that,” and “what was your 
colleague doing at this time.” 

A technique that many interviewers use, which forces them to listen to 
interview responses, is to avoid writing questions in advance, but instead 
writing topics that need to be addressed beforehand. As each topic is cov- 
ered they cross off that item and proceed to the next one. By focusing on 
the interviewee responses, interviewers can follow up with questions that 
correspond to the train of thought of the interviewee, rather than disrupting 
that and thereby interfering with his or her ability to access memory of the 
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event. Also, by adhering to topic rather than question, interviewers allow 
interviewees to focus on single topics at a time, thus minimizing disruption 
to their memories. Exceptions to adhering to topics should only be made if 
the interviewee response calls for it. That is, if the interviewee himself or 
herself, in the response, jumps to another topic, it would be appropriate to 
follow up with a question related to that topic, if asking the interviewee to 
elaborate or provide more detail to the answer just given. 

Interviewers should strive to keep their questions as brief as possible. The 
less the interviewer talks, the more that the interviewee will be encouraged 
to do so. In this manner, if interviewees pause during a response, interview- 
ers should wait as long as appropriate to allow the interviewees to fully rec- 
ollect their thoughts and complete their responses. Even if the pauses are 
longer than would typically occur in a conversation, interviewers should 
avoid intruding on interviewee thoughts until they are certain that the inter- 
viewee has completed a response. 

By the same token, questions that ask for yes or no responses should 
be avoided as they limit the potential information interviewees are asked 
to provide. Interviewers should also avoid asking leading questions, for 
example, did you see the explosion that occurred, and instead ask an open- 
ended question in its place, for example, what did you see, also in an effort 
to encourage the interviewee to provide as many details as he or she can 
recall. 


False Responses 


Accident investigations, unlike forensic or criminal investigations, do not 
aim to identify a perpetrator of a crime but to understand the cause of an 
accident. Although in some jurisdictions operators whose errors have led 
to accidents may be prosecuted for their role in the accidents, to the acci- 
dent investigator the more cooperative the interviewer can be with the inter- 
viewee the more information the interviewee can be reasonably expected 
to provide, independent of prosecutor actions. Thus, it behooves interview- 
ers to avoid approaching the interviewee as someone who has potentially 
caused an accident but rather, as someone whose information can assist to 
prevent future accidents. Nonetheless, interviewers may encounter inter- 
viewees who believe that they have something to hide, even when they face 
no criminal or civil action as a result of the accident. 

When encountering an interviewee who is believed to be answering ques- 
tions falsely, accusing the interviewee of such will do little to enhance coop- 
eration. After all, the interviewee may be telling the truth. Rather, the best 
approach is to rephrase the question, perhaps repeat the question, and then 
move on to a different topic. The fact is that interviewers have little ability 
to get interviewees to change their responses when answers are believed to 
be false. 
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Finding, Scheduling, and Selecting 
a Location for the Interviews 


Eyewitnesses 


Local media, law enforcement and rescue personnel, as well as those working 
near the event, often establish contact with eyewitnesses. Law enforcement 
and rescue officials are usually the first to arrive at the scene, are experienced 
in locating witnesses, and can encourage them to cooperate with investiga- 
tors. As such they may be able to assist investigators in identifying and locat- 
ing eyewitnesses. In addition, media representatives, who generally arrive 
on scene quickly and are usually adept at locating and interviewing eyewit- 
nesses, may also be able to help locate eyewitnesses. Media representatives 
can also be asked to inform the public of a need for eyewitnesses and thus, 
help disseminate to eyewitnesses the investigators’ need for their assistance. 

Scheduling the Interview. There is usually extensive media attention and 
general discussion after a major accident, and eyewitnesses have difficulty 
avoiding exposure to accounts of the event. Therefore, eyewitnesses should 
be interviewed as soon after the event as possible to reduce their exposure 
to potentially contaminating information and other adverse influences on 
their recall. 

Selecting a Location for the Interview. Eyewitnesses may recall more of an 
event at the locations at which they witnessed it. Reencountering the cues 
associated with the event, such as buildings, objects, hills, or trees may 
help them remember additional information (Fisher and Geiselman, 1992). 
However, if doing so will delay interviewing others, the delay could out- 
weigh the advantages of site-related memory enhancements. Consequently, 
if eyewitnesses can be interviewed quickly at the locations at which they 
observed the event one should do so, but otherwise the interview should not 
be delayed. 


Operators 


The company managing the system should be able to coordinate interviews 
with operators who were on duty during the event. 

Scheduling the Interview. Operators, as other interviewees, are subject to 
memory contamination and this would ordinarily call for interviewing them 
quickly after an event. However, two factors argue for delaying their inter- 
views for 24- to 48-hours after the event. First, it often takes several days 
to obtain even routine system-related information after an event. Delaying 
operator interviews allows investigators to examine records, talk to eyewit- 
nesses, and learn about the event, enabling them to ask more sophisticated 
questions about the event, and increasing the value of information they 
would likely obtain immediately after an event. Second, operators involved 
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in a major event are often distraught and may have difficulty concentrating. 
Delaying the interview allows them to compose themselves. A 1- to 2-day 
delay after the event is usually sufficient to serve both operator and inter- 
viewer needs. 

If the operator was injured in the event, it is important to obtain the attend- 
ing physician's approval before the interview. The physician should be able to 
assure you that the operator would not be harmed by the interview, is capa- 
ble of comprehending and responding to questions, will not have answers 
affected by the medications prescribed, and will have full awareness of the 
interview. If the physician cannot so assure you, wait until after the opera- 
tor has ceased to use the medications, when they are no longer in his or her 
system, and he or she is otherwise able to fully comprehend and respond 
accurately to questions before initiating the interview. 

Selecting a Location for the Interview. The location of an operator interview 
is critical. The wrong setting could increase the operator's anxiety and dis- 
comfort, and thus hamper his or her recall. Ideally, operators should be inter- 
viewed in a professional setting, free of distractions such as busy hallways, 
elevators, and noisy streets, and equipped with comfortable chairs and a 
table or desk. A hotel conference or meeting room is often a good choice. 
Telephones, pagers, cell phones, and other potential distractions should be 
disengaged or turned off, window shades or curtains drawn to minimize 
outside distractions, and thermostats set to a comfortable temperature. 
Operating and training manuals and other pertinent material should be 
accessible. Finally, water or other refreshments for the operator should be 
available throughout the interview. 

When an operator is having difficulty recalling events, conducting the 
interview in a system mock up or simulator may enhance recall. If this is not 
possible, diagrams or photographs of system components, and equipment 
used at the time of the event, should be provided for reference. 


Those Familiar with Critical System Elements 


Company management should assist in finding and scheduling interviews 
with company personnel. They can also usually help to find friends and 
close family members of the operators if needed. 

Scheduling the Interview. It is not critical to minimize the exposure to 
memory contaminants of those who are familiar with critical system ele- 
ments, and these interviews can safely be delayed for several weeks after 
the event to allow investigators to complete interviews with those whose 
recall is more sensitive to the effects of time delays. Nevertheless, family and 
acquaintances of the operators should be interviewed quickly because their 
memories of the operator's activities could be contaminated by exposure to 
subsequent accounts. 

Location of the Interview. Whether the interviewee is a family member or 
a company supervisor, the interview location should be professional and 
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business-like, free of distractions, and large enough to comfortably accom- 
modate those participating in the interview. Interviewees should have access 
to reference material, and other helpful items as needed. 


AAA Á-———————————— ———— —— i 


Administrative Concerns 


Interviewers must attend to numerous logistical and administrative details 
to ensure a successful interview. These details, which are critical to ensure 
that interviews result the maximum possible information include 


* The interview record 
* Operator team members 
* Multiple interviewers 


* Information to provide interviewees 


The Interview Record 


The interview's written record is the medium through which interview data 
are conveyed to others. Deficiencies in the record lessen its value, regardless 
of the overall interview quality. Several interview documentation methods 
are available, each with particular advantages and disadvantages. 

Video or audio recordings provide the most accurate interview record 
and usually require the least interviewer effort during the interview. 
Therefore, if possible, investigators should strive to record the interview. 
Interviewees generally adapt quickly to the presence of recording equip- 
ment. Interviewers must also follow appropriate rules governing interview 
recordings because some jurisdictions prohibit recording interviews without 
interviewee approval and interviewees may be unwilling to permit record- 
ing their interviews. 

Professional transcribers or court reporters can generate interview tran- 
scripts, however the cost may be substantial and transcribers are often 
unfamiliar with technical terms that may arise. In addition, they are more 
obtrusive than recording devices, although interviewees in time usually 
adapt to their presence as well. 

Interviewers who are adept keyboarders can also enter interviewee 
responses directly into a laptop or tablet during an interview, an inexpen- 
sive alternative to transcribing the interview. Spell-checkers facilitate this 
method by allowing keyboarders to input the data quickly without the need 
for much accuracy. However, interviewers may find it difficult to ask ques- 
tions, follow the responses, and keyboard the responses simultaneously. 
Many laptops and tablets also allow an audio interview record to be entered 
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directly onto audio files, thus providing an audio interview record to sup- 
port transcribed notes. 

The common method of preparing a written record calls for interviewers to 
write their own notes taken during the interview and add more details from 
memory afterward. This method is inexpensive and requires little writing 
skill, but it is the least accurate of the interview documentation techniques. 
If multiple interviewers participate in the interviews, they should combine 
their notes after completing the interview, to provide a single complete set of 
interview notes. 


Operator Team Members 


Operator team members should be interviewed individually, and not in the 
presence of other team members, to reduce the likelihood that one would 
influence another's responses. Operators should also be asked not to discuss 
the interviews with their colleagues, although enforcing this prohibition 
may be difficult. 


Multiple Interviewers 


The number of interviewers should be kept to the minimum possible to 
maximize interviewer-interviewee rapport. The lower the number of inter- 
viewers who are present, the less intimidating the process will be to the 
interviewee and the more readily interviewer and interviewee can establish 
rapport. However, multiple interviewers can also enhance the interview by 
interviewing numerous eyewitnesses simultaneously, increasing the effi- 
ciency of the interview, and in the case of operators or those familiar with 
system elements, by bringing additional perspectives to the interview, thus 
increasing the interview scope and depth. 

To avoid potential administrative difficulties among multiple interviewers, 
it is important to establish and maintain guidelines governing the conduct 
of the interviews to enhance the likelihood that the interviews go smoothly. 
These guidelines should be articulated, understood, and agreed upon before 
the interviews begin. These should include identifying the lead questioner, 
establishing the order in which interviewers question the interviewee, and 
determining how to deal with interruptions. 

Order of Interviewers. The lead interviewer is generally the group’s leader 
and therefore, the person who generally questions the interviewee first and 
sets the tone of the interview. Thereafter, each interviewer should be given 
the opportunity to ask an initial set of questions and at least one set of follow 
up questions, in the order appropriate to the group. 

Interruptions. Interruptions to either interviewees or interviewers should 
be avoided, except when an interviewee does not understand the question 
or his or her response has deviated from the thrust of the question. In that 
event, only the lead interviewer should be permitted to bring the discussion 
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back on track. Otherwise, interviewers should note additional questions 
they may have, and then ask those questions during follow up questioning. 


Information to Provide Interviewees 


Before the interview, interviewers should identify the information that they 
will be willing to provide interviewees. In general, information that cannot 
be shared with those outside the investigation, is speculative or analytical, 
or can influence the recollections of subsequent interviewees, should not be 
discussed. 


Concluding the Interview 


Conclude the interview when all pertinent information sought from the 
interviewee has been obtained, and when it is clear that the interviewee has 
no additional information to offer. Interviewers should not have difficulty 
determining when they have reached this point. Interviews should not be 
concluded for other reasons, such as the need to attend to other investigative 
activities. If a potential conflict can interfere with an interview the interview 
should be rescheduled at a time when it can be conducted without disruption. 

After responding to all questions, interviewees should be asked whether 
they have additional information to offer, and if there are important ques- 
tions about the event that they had not been asked. Interviewers should then 
ask the interviewees whether they have questions, and give them previously 
agreed upon information, as needed. The lead interviewer should then give 
the interviewees his or her business card or contact information. Finally, 
interviewers should thank the interviewees for their cooperation and assis- 
tance to the investigation. 





Interpreting Interview Data 


Because of the many factors that could influence interviewee responses, 
interview data should not be considered in isolation, but only in conjunction 
with other investigative data. 


Eyewitnesses 


Information from eyewitnesses should be used to supplement, but not sup- 
plant, other data because of the potential contamination of their recollections. 
Nonetheless, eyewitnesses can add details that may not be available from 
other sources. However, if data from more accurate and reliable sources con- 
tradict eyewitness data, credence should be given to the more reliable data. 
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Operators 


Operators’ first-hand experiences in the complex system in which the event 
occurred can considerably enhance ones's understanding of the nature of 
critical errors. Operators with good recall of the event, who sincerely wish 
to help the investigation, can provide data that simply cannot be obtained 
elsewhere. Nevertheless, do not be surprised if operators have difficulty 
recalling specific details in the very dynamic or stressful environment in 
which events in complex systems often unfold. They are often so engrossed 
in responding to the event and preventing an accident that they may have 
difficulty afterward recalling details about that event. 


Those Familiar with Critical System Elements 


System managers, designers, and others who are knowledgeable about a 
system can provide information that complements information from other 
sources. For example, instructors can describe the meaning of comments 
found in training records, managers can explain the intent that lay behind 
statements they wrote in performance appraisals, and equipment designers 
can describe their design philosophy and its manifestation in the operating 
system. Those operating in the system's blunt end can help explain the devel- 
opment of operating procedures, training programs, and other potentially 
critical investigative issues. 

The value of the information that family and acquaintances of the opera- 
tors provide depends on the availability of other supporting information, as 
well as on the relevance of the data to the event. Family and acquaintances 
of the operators can provide especially valuable information when there is 
little information available from other sources. Family members or friends of 
operators injured or otherwise unavailable may be able to give investigators 
accounts of the operators’ activities in the 72 hours before the accident, while 
their colleagues can describe their work habits and attitudes to investigators. 





Summary 


People are subject to memory errors, but interviewers can exercise some 
control over potential influences on these errors and increase the amount 
and value of the information interviewees provide. Error investigators gen- 
erally question three types of interviewees, eyewitnesses, operators, and 
those familiar with critical system elements. Each interviewee has specific 
concerns and information to provide. Interviewers seek different informa- 
tion from each interviewee, and should recognize their different needs when 
eliciting the information. 
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Interviewers should identify the type of information that the interviewees 
can provide, and identify the issues to elicit the desired information before 
the interview. Interviewers should ask questions that correspond to the 
sequence of issues, by beginning with open-ended questions and following 
up with more specific questions that address points that the interviewee has 
made. Suggestions are provided for asking questions and chronicling the 
information from interviewees, with illustrations of effective and ineffective 
interviewing techniques. 


CONDUCTING INTERVIEWS: THE INTERVIEW PROCESS 


* Identify the information sought, then identify and locate inter- 
viewees that can provide that information. 


STRUCTURING THE INTERVIEW: BEGINNING 


* Thank the interviewees for their cooperation. 


* Introduce each interviewer to the interviewees, giving titles 
and affiliations, if multiple interviewers are participating. 


* Describe the purpose of the interview and mention the value 
of the information sought. 


* Review the interview guidelines with interviewees and ask 
them if they have questions before beginning. 


THE QUESTION SEQUENCE 

* Determine beforehand the order of issues to be addressed in 
questioning each interviewee. 

* Introduce new issues after each issue has been addressed in 
turn. 

* Useone of two types of sequences of issues with interviewees, 
chronological order or order of importance. 

* If important, address issues that the interviewee raised while 
discussing another issue, even if it means going out of sequence. 


FOLLOW-UP QUESTIONS 


* Use follow up questions when an interviewer has not pursued 
anissue that an interviewee has raised, or when an interviewee 
has raised multiple issues in a response to one question. 


* Ensure that other interviewers wait until their turns to follow 
up on an issue rather than disrupt other interviewers. 


Interviews 


Allow each interviewer at least two opportunities to ask ques- 
tions, one to ask the initial questions and a second to ask follow 
up questions. 


ATTENDING TO THE INTERVIEWEE 


Show attention to the interviewee at all times. 

Be aware of and avoid nonverbal cues that may unwittingly be 
sent to the interviewee. 

Ensure that the interviewee is comfortable and that the inter- 
view location is free of distractions. 

Stop the interview if interviewees appear uncomfortable or 
begin to lose their composure. 

Do not offer the interviewee career or personal assistance, but 
demonstrate concern for the interviewee. 


FALSE RESPONSES 


Rephrase or refocus the questions if there is reason to believe 
that an interviewee has answered questions falsely. 


Do not express disapproval or attempt to coerce a truthful 
response from the interviewee. 


Do not use a prosecutorial tone in asking questions. 


ASKING QUESTIONS 


Begin questions with verbs. 

Keep questions as brief as possible. 

Phrase questions to encourage interviewees to be as expansive 
as possible. 

Progress to more focused questions that follow up on the 
points interviewees make in response to initial questions. 
Attend to the interviewee answers and, to the extent possible, 
base questions on those answers. 

Avoid questions that permit one or two word answers, for 
example, yes or no, unless following up on a response to a par- 
ticular interviewee answer. 

Avoid asking questions from a predetermined list, in the pre- 
determined order. 

Identify issues to be addressed and ask questions that relate to 
those issues. 
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(a) Each certificate holder shall- 

(1) Maintain current records of each crewmember...that show whether 
the crewmember...complies with...proficiency and route checks, air- 
plane and route qualifications, training, any required physical examina- 
tions, flight, duty, and rest time records. 

(2) Record each action taken concerning the release from employment or 
physical or professional disqualification of any flight crewmember...and 
keep the record for at least six months thereafter. 


14 Code of [United States] Federal Regulations, Part 121.683 
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Introduction 


Investigators routinely obtain and examine information maintained in vari- 
ous types of records during investigations, records that contain information 
describing characteristics of the systems, operators, and the companies that 
operate them. Recognizing the value of written documentation and knowing 
when and how to apply the information they contain to an error investiga- 
tion are critical investigative skills. In this chapter the types of documenta- 
tion available to investigators will be reviewed, preparation for examining 
written documentation discussed, qualitative aspects of the data considered, 
and the application of the data to various situations examined. The docu- 
mentation is referred to as “written” although most such records, whether 
maintained by company, regulator, or small business, are largely electroni- 
cally stored today. 


HE Îi 


Documentation 


In some industries regulators mandate the data companies are to collect, the 
frequency and regularity of data collection, and the format of the data main- 
tained. For example, the Federal Aviation Administration requires airlines 
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to retain substantial information on pilots, airplanes, simulators, training 
programs, maintenance activities, and dispatch releases. Whether required 
or not, most companies maintain documentation on employees, procedures, 
training programs, and the like, information that investigators will want 
to examine because of their potential relevance to error antecedence. This 
information can take several forms. 


Company-Maintained Documentation 
Personnel Records 


These contain information pertaining to operators’ educational and employ- 
ment histories, and their current jobs, such as supervisor appraisals, letters 
of commendation or reprimand, and other relevant material. 

Investigators of the grounding of the tanker Exxon Valdez, in Alaska’s 
Prince William Sound (National Transportation Safety Board, 1990), used this 
information to help identify some of the antecedents that led to the ground- 
ing. Performance appraisals of the third mate completed 3 years before the 
accident, the third mate who was the senior officer on the bridge and over- 
seeing the tanker’s path at the time of the grounding, had been rated low 
on several critical performance elements. As investigators report (National 
Transportation Safety Board, 1990), 


In one performance appraisal as a third mate his “overall effectiveness” 
had been evaluated as “high,” one rating below “outstanding.” The two 
lowest ratings he received as a third mate were given to him while he was 
on the Exxon Jamestown in 1987 and contained the following comments: 
“performs adequately” in the rating categories of “seeks advice or guid- 
ance at the appropriate time and informs supervisor when appropriate” 
and “demonstrates thorough knowledge of ship and its handling char- 
acteristics.” In a summary of employee weaknesses, the evaluator wrote, 
“He [third mate] seems reluctant or uncomfortable in keeping his supe- 
rior posted on his progress and/or problems in assigned tasks.” (p. 33) 


The supervisor’s comments gave investigators background information to 
help understand and explain the errors the third mate committed during the 
accident, where he failed to inform his superiors of his difficulties attempt- 
ing to steer the vessel in the confined waters of Prince William Sound. 
Unfortunately, since then companies have tended to decrease the amount of 
these types of comments in written documentation, to the detriment of error 
investigations. 


Training Records 


Training records contain information on operator training that the com- 
pany and others have conducted, and may include test scores and other 
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performance measures. Instructor comments and other information, which 
go beyond test scores, may also be included. 


Medical Records 


Company-maintained medical records contain data associated with regula- 
tor- or company-mandated medical requirements, such as physical examina- 
tions, vision, and hearing evaluations. The records may also include medical 
information outside of direct company control, such as descriptions of opera- 
tors” medical evaluations, treatment, and prescriptions that were paid for by 
company-sponsored medical insurers. 

As described in Chapter 10, investigators concluded that the master of 
the Exxon Valdez had been impaired by alcohol at the time of the event. 
Information in the company's medical records of the master indicated that 
he had been treated for alcohol abuse about 4 years before the accident. As 
investigators write, 


An Exxon Individual Disability Report, signed by the attending phy- 
sician and dated April 16, 1985, showed that the master was admitted 
to a hospital on April 2, 1985, and “remains in residence at the present 
time.” The report stated: “He is a 38 yo W/M [year old white male] who 
has been depressed and demoralized; he’s been drinking excessively, 
episodically, which resulted in familiar and vocational dysfunction.” 
A treatment program was suggested that included a recommenda- 
tion that he be given a leave of absence to get involved in Alcoholics 
Anonymous, psychotherapy, and aftercare. (National Transportation 
Safety Board, 1990, p. 32) 


Because of the information about the captain that the company maintained 
in its records, investigators believed that, having this information, the com- 
pany should have monitored his alcohol use closely. That it was believed that 
he had consumed alcohol while on duty on the vessel before the accident 
suggested that its oversight was deficient, and that the rehabilitation pro- 
gram that he had entered was flawed, or his commitment to rehabilitation 
was poor. 

Some operators may attempt to conceal information from their employer 
about potentially adverse medical conditions, and one should not assume that 
a company’s medical records contain all relevant medical information about 
an operator. The discussion in Chapter 5 of the 1996 New Jersey rail accident 
(National Transportation Safety Board, 1997), in which the train operator 
was unable to recognize the stop signal because of his visual impairment, 
demonstrates that company-retained medical records may be incomplete. 
The train operator had successfully hidden his diabetes and the diabetes- 
related visual impairment from his employer, and the company’s evalua- 
tion of the operator’s vision failed to detect his impairment. Investigators 
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obtained the information they needed to determine his medical condition 
from the records of his personal physician. 


Documentation Not Maintained by the Company 


Sources unrelated to the operator’s employer might also retain operator- 
related information. For example, financial, legal, and family records, as well 
as records of driving history, may reveal aspects of an operator’s behavior that, 
although not necessarily job-related, may nevertheless affect performance. 

The information may reveal the presence of stressors that could degrade 
operator performance. The value of the information depends on the rela- 
tionship of this information to antecedents to these errors. However, unlike 
company-maintained documentation, access to such information may be 
restricted by regulation to law enforcement personnel, and error investiga- 
tors may need their assistance to examine it. 


Information from Legal Proceedings 


Pending civil actions or criminal charges are stressors and may adversely 
affect operator performance. To take an extreme example, nearly all would 
consider criminal charges and a possible prison sentence to be stressful. By 
contrast, pending civil action may not necessarily be stressful. Although 
many find civil action stressful, others do not, particularly if they are not lia- 
ble for financial penalties or legal expenses. For example, in the United States 
drivers involved in automobile accidents could be sued for damages that 
considerably exceed the direct costs of the accident itself. However, because 
most jurisdictions require drivers to be insured for protection against such 
losses, the accident-related expenses may be negligible and therefore, the 
experience may not necessarily be stressful because the drivers may not 
incur financial costs as a result of the experience. 


Family-Related Information 


Family-related information may reveal likely stressors such as divorce, child 
custody disputes, or similar experiences. 


Driving History 


A record of an operator’s driving transgressions may suggest a pattern of 
behaviors, attitudes, and substance dependencies that are pertinent to an 
error investigation. Because many operators are exposed to law enforcement 
authorities when driving, those with records of infractions for driving while 
intoxicated may also manifest an alcohol or drug dependency because the 
infractions suggest an inability to avoid alcohol use outside of the work set- 
ting (McFadden, 1997). 
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Nevertheless, it must be noted that while information contained in driv- 
ing history and similar written documentation may relate to operator ante- 
cedents that information pertains to events occurring outside of the system. 
Therefore, the information should not be applied directly to the examination 
of the errors, but should instead complement other data about the operator. 
The data can also suggest avenues of inquiry that investigators may wish to 
pursue regarding an operator's error, such as possible chemical dependency 
or financial reverses. 





Information Value 


As with other accident investigation data, whether from written documenta- 
tion, recorded media, or interviews, investigators should evaluate written 
documentation data to assess their value. Quality may vary among and even 
within a single source such as company records. Poor-quality information 
contributes little to an investigation, regardless of its relevance to the event. 

In general, the factors that can affect the value of written documentation 
data are 


* Quantity 


Collection frequency and regularity 


Length of time since collected 
Reliability 
Validity 


Quantity 


All things being equal, the more data collected about a parameter the more 
that is learned about it and the greater the value of the data to the investiga- 
tion. Similarly, the greater the number of parameters documented, the more 
that is revealed about potential antecedents. Because of the impossibility of 
infinitely documenting a system parameter, written documentation data 
are considered to be samples of the universe of data that can be potentially 
derived about the relevant parameters. Hence, the more data obtained, the 
closer the measure corresponds to dimensions of the actual attribute rather 
than to a sample of it. 

For example, a single health measure, heart rate or pulse, provides a limited 
portrait of an individual's cardiovascular state. Generally, health care pro- 
viders measure pulse in 15- to 30-second intervals to derive a baseline rate. 
However, to obtain more accurate measures they could document heart rate 
over a 1- to 2-hour period. Although the obtained data would more closely 
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correspond to a person’s baseline pulse on that day, the cost of obtaining the 
data would not justify the benefits that would be derived from the increased 
accuracy. Therefore, the relatively brief measures of heart rate provide data 
that effectively, but not perfectly, measure the parameter of interest. 


Collection Frequency and Regularity 


The more often a trait is measured, the better the data can reveal changes in 
measures of the trait. Frequent measures provide a more complete portrait 
of a parameter than would infrequent measures. For example, people who 
weigh themselves only once every 5 years would not learn of small weight 
variations in that period. By contrast, if they weighed themselves 100 times 
in the 5-year period, they would obtain a more complete picture of their 
actual weights than possible from a single measure. 

Frequent measures of attributes should also be reasonably spaced to reveal 
variations in the trait over time. Using the same example, weights mea- 
sured 100 times every 3 weeks, describe weight more accurately over the 
5-year period than 100 weights measured in 1 month of that 5-year period. 
The greater the number of measures of a parameter, and the more equal the 
intervals between those measures, the greater the value of the obtained data. 


Length of Time Since Collected 


The closer to the time of the incident or accident in which measures of a 
parameter are obtained, the closer the measures will correspond to the 
actual parameter value at the time of the event, which is the period of the 
investigators’ greatest interest. Investigators may find that data obtained 
after, but close to the time of an incident or accident, are more valuable than 
data obtained before the event, provided that the experience of the event 
did not affect the value of the parameter. In this way, a measure of an opera- 
tor’s performance taken 2 days after an accident will likely be more valuable 
to investigators than a comparable performance measure obtained a year 
before, so long as the operator’s performance did not change as a result of the 
accident experience or in association with it. 


Reliability 


Reliability refers to the consistency of measures of a trait. Reliable mea- 
sures vary little between measurements, while unreliable measures will 
vary. Using the illustration of weight, if people were to weigh themselves on 
well-calibrated scales throughout the day, the weights would change little, 
except for minor diurnal weight fluctuations associated with meals, fluid 
loss, etc. However, if they were to weigh themselves on several uncalibrated 
scales in the same period, and those weights varied by several pounds or 
kilograms, the differences would most likely result from differences in the 
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scales’ accuracy rather than changes in their weights, because variations of 
that magnitude are rare. 


Validity 


The relationship between the measurement of a parameter and its actual 
value is known as validity. The closer a measure of a trait corresponds to 
the value of that trait, the greater its validity. For example, a test used to 
select candidates to operate a nuclear reactor should predict how well they 
would operate the reactor. A vocabulary test that applicants complete to be 
selected as reactor operators may be a valid measure of verbal achievement, 
but it would likely be of limited value predicting performance as a reactor 
operator. The knowledge needed to operate reactors extends well beyond 
vocabulary knowledge. A more valid measure would directly assess skills 
needed to operate reactors, developed from closely observing operator tasks. 





Changes in Written Documentation 


It is not unusual to find that over time the nature of information contained 
within written documentation changes. Measures of operator traits may 
change as operators gain experience; alternatively, measurement standards 
may change as systems evolve. Changes in two elements of written docu- 
mentation, central tendency and organizational factors, are particularly 
noteworthy as they may reveal much about the operator and the operator’s 
employer. 


Stable Characteristics 


Assessments of particular attributes, characteristics, or conditions should 
reveal what will be referred to as “stable characteristics,” traits that tend to 
remain fairly stable from year to year. Performance appraisals, for example, 
generally include an assessment of overall job performance, a fairly stable 
measure over the short term. Someone who performs well one year would 
be expected to perform well the next. Because of its stability, marked incon- 
sistencies in an operator’s overall job performance over time would be cause 
for additional inquiry. 

Investigators of the 1994 accident involving an FAA-operated aircraft 
used to inspect navigation aids, discussed in Chapter 6, found that the 
stable characteristics of the pilot in command’s performance data con- 
tained among his personnel records was positive, with one inconsistency 
(National Transportation Safety Board, 1994). His supervisor had consis- 
tently evaluated the pilot’s performance positively, however, 6 months before 
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the accident he reprimanded the pilot for a performance-related event, a 
reprimand that was inserted into his performance records. 

After interviewing the pilot’s peers and supervisors, investigators 
resolved the inconsistency between his general performance over several 
years and that described in the reprimand. Co-workers provided a differ- 
ent account of the quality of the pilot’s performance than that described in 
the stable characteristics of the performance appraisal results. Interviewees 
described instances in which the pilot disregarded safe operating practices, 
information that corresponded to the gist of the letter of reprimand and his 
performance at the time of the accident, but not to the stable characteristics 
of the performance appraisal results. Because the action that the pilot had 
taken, which resulted in the letter of reprimand, was reported by another 
pilot, the supervisor was forced to respond to the action. Therefore, the 
value of the stable characteristics was diminished as the inconsistency was 
explained. 


Organizational Factors 


Written documentation may contain data that reveal characteristics of the 
company as well as the person. In the above-described accident for example, 
investigators resolved the discrepancy within the captain’s personnel records 
by interviewing the pilot’s peers and supervisors. Their comments raised 
critical issues regarding the performance of the supervisor and the organiza- 
tion in overseeing the pilot and their response to safety-related issues. 

Comparing written documentation data of one operator to similar data of 
other operators in an organization may also give insights into company poli- 
cies and actions. To illustrate, consider a locomotive engineer whose person- 
nel records shows numerous citations for rule violations in a 5-year period, 
an individual who may well be a poor performer. However, if the records of 
other engineers in that railroad contained relatively equivalent infractions 
over similar intervals, then the engineer’s performance would be “average” 
among other engineers. 





Summary 


Companies generally maintain data in three types of written documenta- 
tion, personnel, training, and medical records. The data in these records may 
provide information about the operator and his or her employer. Driving his- 
tory, and financial and legal records may also contain information that could 
help to identify operator-related antecedents to error. 

The quantity of information about a trait, the frequency and regularity 
with which the information was collected, and the reliability and validity of 
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the measures that provide the data, affect the quality of data in written doc- 
umentation. Investigators should examine general trends within the data, 
departures from consistency, relative rate and direction of change, similarity 
to information in the company records of others, and organizational factors, 
when reviewing data in written documentation. 


APPLYING WRITTEN DOCUMENTATION 
TO INVESTIGATIONS 


PREPARATION 


* First identify the information likely to be pertinent to the event, 
and the most likely sources of that information. 


e Identify and locate experts when reviewing unfamiliar infor- 
mation, such as financial and medical data, for assistance in 
interpreting the data. 


MEDICAL INFORMATION 


e Examine operator medical records of both company-spon- 
sored medical care and sources independent of the company, 
if available. 

* Document operator visits to health care providers that occurred 
within the 3 years before the event, results of diagnostic tests 
and health care provider's diagnoses, treatments, and pre- 
scribed and recommended over-the-counter medications. 


FAMILY INFORMATION 


* Obtain family information if there is reason to believe that 
the operator experienced family-related difficulties near the 
time of the event. Obtain law enforcement assistance to obtain 
access to the information if necessary. 


ASSESSING THE DATA 

* Note the amount of data available on a given trait, the fre- 
quency and regularity of data collection, and the reliability 
and validity of the data. 

* Determine the presence of stable characteristics of the data that 
pertain to each trait. 

* Resolve inconsistencies in the data, generally by interviewing 
those who can comment on the discrepant information. 

* Look for information on the company as well as on the opera- 
tors within company-maintained documentation. 
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AFTER REVIEWING THE DATA 
e Summarize the major points of the information contained 
within the written documentation. 
* Contact the person familiar with the information if there is still 
uncertainty about the meaning of some of the information. 
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Maintenance and Inspection 











If some evil genius were given the job of creating an activity guaranteed 
to produce an abundance of errors, he or she would probably come up 
with something; that involved the frequent removal and replacement of 
large numbers of varied components, often carried out in cramped and 
poorly lit spaces with less-than-adequate tools, and usually under severe 
time pressure. 


Reason and Hobbs, 2003 
Managing Maintenance Error 





Introduction 


Complex systems need to be maintained to be able to continuously oper- 
ate in working order because components wear out, cables stretch, bolts 
loosen, fluids become depleted or dirty, and critical elements fail. Preventive 
maintenance reduces the likelihood of such occurrences resulting in compo- 
nent failures or malfunctions during system operations. Systems generally 
undergo two types of maintenance, scheduled and unscheduled. Scheduled 
maintenance is preventive, intended to inspect and replace components, 
fluids, belts, etc., before they wear out or fail. Unscheduled maintenance is 
directed to repair equipment anomalies. 

All of the antecedents to error described in these chapters apply to 
maintenance errors, which may result from antecedents in training, opera- 
tor team operations, and so on. What distinguishes maintenance errors from 
the others described here are antecedents unique to maintenance and the 
manner in which they can affect maintenance operations. To some extent 
this is because the environment in which maintenance is typically con- 
ducted is unique. As Hobbs and Williamson (2003) observe regarding avia- 
tion maintenance: 


Aircraft maintenance is performed in an environment that contains 
many potential error producing conditions. Maintenance workers rou- 
tinely contend with inadequately designed documentation, time pres- 
sures, shift work, and environmental extremes. (p. 187) 
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Despite the importance of maintenance, untilrecently researchers paid scant 
attention to maintenance errors. Exceptions include Boeing Corporation's 
study of commercial aircraft maintenance errors, and the maintenance error 
decision aid (or MEDA) they developed. This was designed to help mainte- 
nance personnel identify the antecedents to maintenance errors and imple- 
ment strategies to reduce the likelihood of their reoccurrence (Rankin, Hibit, 
Allen, and Sargent, 2000). The Federal Aviation Administration has also 
devoted substantial effort to understand maintenance and inspection errors, 
greatly enhancing our understanding of maintenance operations. This chap- 
ter will examine the maintenance environment and maintenance tasks to 
explain how antecedents to error in maintenance tasks can occur. 





Maintenance Tasks 


Depending on the complexity of the system, maintenance may be carried 
out by one or a team of operators performing numerous, often unrelated 
tasks. The number of maintenance personnel and the number of tasks they 
perform depends, of course, on the complexity of the task. Often one team 
of individuals assigns and describes the task to be done, another team per- 
forms the maintenance, and a different one inspects the results of the main- 
tenance to verify its quality. As Ward et al. (2010), referring to maintenance 
in commerical aviation describe, 


Aircraft maintenance is a highly dynamic and regulated industry char- 
acterised, for example, by complex and interdependent systems and 
technologies, detailed and legally binding task procedures and docu- 
mentation, highly publicised accident rates and highly regulated man- 
agement systems to ensure reliability, efficiency and safety at all times 
(Corrigan 2002). Task analysis has revealed aircraft maintenance activity 
to be a complex socio-technical system requiring sustained coordina- 
tion, communication and cooperation between different work groups 
and teams including aircraft maintenance engineers (AMEs), crew man- 
agers, inspectors and hangar managers, various other subsystems, such 
as planning and commercial, stores, quality and engineering and exter- 
nal bodies, such as the regulators, the manufacturer, the customer and 
the airline, in order to ensure efficient and effective operations. (p. 248) 


Because maintenance technicians often receive limited or no immediate 
feedback on the quality of the tasks they complete, maintenance errors, such 
as using an incorrectly sized bolt or filling a component with the wrong 
fluid, may not be discovered until after the repaired or maintained item has 
been reintroduced into service. Hobbs and Kanki (2008) surveyed aviation 
maintenance errors reported to NASA's Aviation Safety Reporting System. 
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The most common errors they identified were an omitted action, that is, an 
action that should have been but was not carried out, inaccurate or incorrect 
documentation, or an incorrect part installed. Further, they found an asso- 
ciation between parts of the airplane and the types of errors associated with 
it. Particular parts or systems tended to be associated with certain types of 
errors and error antecedents. 

Drury (1998) contends that maintenance and inspection tasks consist fun- 
damentally of a handful of discrete steps, in which technicians primarily 
interpret and diagnose, act (i.e., repair, replace, or inspect), and evaluate or 
inspect the maintenance action. Antonovsky, Pollock, and Straker (2014) 
describe the cognitive tasks involved in maintenance activities, tasks that 
largely fall into Drury’s interpret and diagnose, and evaluate categories. 


Problem solving in maintenance relies on correctly determining the 
source of a fault, deciding on the most efficient means of correction, and 
applying the solution effectively. All of these cognitive processes are 
required for successful corrective maintenance, and a logical flaw in any 
of these leaves the fault unresolved. (p. 315) 


The type of maintenance error an investigator may look for depends 
largely on which of the steps Drury identified, and to which particular step 
the maintenance action belonged. Diagnosis errors are different from main- 
tenance action errors, and the antecedents will likely be different as well. 
Further, unlike tasks that system operators typically perform, maintenance 
tasks offer multiple opportunities to error that may be independent of the 
task itself. For example, unlike tasks that other system operators conduct, 
maintenance tasks may vary from day to day, affording maintenance per- 
sonnel little opportunity that other operators would have to become well- 
versed in a few tasks, performed repeatedly as part of the job. In addition, 
because maintenance tasks are often initiated by someone or something 
different from the person or persons performing the task, opportunities for 
communication errors between the dissemination of maintenance instruc- 
tions and their receipt and comprehension by the person performing the task 
offer opportunities for error different from those of system operators. Thus, 
the first step in identifying antecedents to maintenance error is to examine 
the nature of the communication to the technician who performed the task 
to assess how well he or she understood the task to be completed, and his or 
her understanding of the process needed to complete the task. 


Interpret and Diagnose 


Communicating the task to be performed. Maintenance technicians typically 
initiate a task after receiving either verbal or written instructions to do so. 
The instructions can describe an anomaly or present a maintenance task and 
list the actions that need to be performed to complete it. In response to an 
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anomaly the maintenance specialist will diagnose the nature of the anom- 
aly and select an appropriate action based on that diagnosis. Because the 
initial reports of anomalies are often provided by those who interact with 
rather than maintain the system, the accounts may be insufficiently precise 
to enable maintenance technicians to readily diagnose the anomalies. Those 
who report the anomalies may be unaware of the needs of the technician, 
they may be inarush to complete writing up the maintenance anomalies, or 
they may simply be describing the anomaly from a user’s perspective rather 
than from that of the person who will repair it. Munro, Kanki, and Jordan 
(2008) describe many barriers to effective communication between pilots and 
maintenance technicians in transmitting information regarding anomalies 
from the operators to the maintenance technicians. 

By contrast, scheduled maintenance tasks are often generated from a 
data base of task schedules, with computer-generated steps from a cen- 
tral source, such as a general maintenance manual, that describe the steps 
needed to complete the task. Drury (1998) found that printed instructions 
given to maintenance technicians occasionally contain flaws that can lead 
to maintenance errors. These include insufficiently sized font and incom- 
plete or poorly written instructions that lead maintenance technicians to 
misunderstand or not fully understand the nature of the task they are to 
perform. 

Because some maintenance tasks are performed over extended periods of 
time, maintenance personnel may not complete them within their duty peri- 
ods. In that event they will need to brief personnel in the subsequent shift on 
the tasks they had completed and the tasks that remain. Those receiving the 
information will then apply the instructions they received to the task to be 
performed. Again, deficiencies in either communicating or comprehending 
the instructions or in applying them to the task may lead to errors. Ambiguity 
in the verbal description of the system condition or the receiver’s misunder- 
standing of the information can also lead to error. Yet verbal instructions are 
common, particularly if tasks were partially completed. 

At the same time maintenance tasks can offer opportunities to mitigate 
errors that are typically absent in complex systems. Maintenance technicians 
can, when faced with a challenging situation, stop their action and recon- 
sider it, or consult with a colleague or supervisor, without having to continue 
overseeing system operations (Hobbs and Williamson, 2002). The ability to 
focus on a situation exclusively without having to simultaneously monitor 
other aspects of system operations considerably eases some of the pressures 
maintenance technicians face in performing their jobs. 

Antonovsky, Pollock, and Straker (2014), in a study of maintenance in 
the offshore petroleum industry, found that 95% of the maintenance errors 
reported by maintenance supervisors were the result of either poor commu- 
nication of maintenance information, or poor problem-solving (i.e., diagno- 
sis). The role of effective communication and interpretation of maintenance 
information is critical to effective maintenance, and an antecedent to error 
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when such communication and interpretation is poor. To perform this step 
properly a maintenance technician must know the type of anomaly the 
operator described and its possible causes, or fully understand the written 
account of the maintenance to be performed. In addition, he or she should be 
sufficiently familiar with the system and its components and subsystems to 
apply the description of the anomaly or maintenance actions to the compo- 
nent or subsystem in question. 

Further, some organizations do not perform their own maintenance but 
contract third parties to conduct both their scheduled and nonscheduled 
maintenance on their behalf. These third-party maintenance contractors are 
independent of the operating organizations, and as a result the potential for 
maintenance errors increases from the increased oversight distance between 
the organization contracting and overseeing the maintenance and the one 
that is conducting the maintenance. Drury, Guy, and Wenner (2010) exam- 
ined the potential for error arising from outsourcing maintenance to third 
parties. They found that when third parties are contracted to maintain the 
systems of others, additional opportunities for errors in communications 
are introduced as maintenance personnel communicate through their own 
supervisors, who then communicate with the operating company and report 
back to the personnel, thus increasing the layers of communication between 
those who maintain systems and those who operate them, therefore increas- 
ing opportunities for communication errors. 

Most regulators consider the operating company to be ultimately respon- 
sible for the maintenance performed on its system, regardless of where the 
maintenance is conducted. Regulators may, depending on the industry, 
establish a floor of minimally acceptable maintenance standards and per- 
sonnel qualifications for the companies they oversee. Nevertheless, increas- 
ing the oversight distance between the company operating the system and 
the company maintaining it allows antecedents to error to be introduced, 
a result of the maintenance organization's unfamiliarity with a company's 
standards and how the standards are expected to be applied. In addition, 
third-party contractors may have inventory systems for parts that are differ- 
ent from that of the company and thus will need to merge its systems with 
that of the company or learn to use another company’s systems. Third-party 
maintenance contractors may also be more likely to misinterpret a compa- 
ny’s instructions than would those working directly for that company. 

Third-party maintenance organizations may themselves contract out the 
hiring of maintenance personnel to other companies, as was seen in the 
DC-9 accident in the Florida Everglades described in Chapter 10 (National 
Transportation Safety Board, 1997). Maintenance personnel from these orga- 
nizations can be expected to have little loyalty to the organization for who 
they perform the maintenance, and should not be expected to be familiar 
with the operating organization's way of communicating and conducting 
maintenance, as would be expected of those in the direct employ of the oper- 
ating company. 
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Act 


After interpreting and diagnosing, technicians generally perform one or 
more of the following actions, each of which calls for different skills 


* Removing a component and replacing it with a repaired, reassem- 
bled, or new component 


* Repairing a component 
* Disassembling and reassembling a component 
* Disconnecting and reconnecting a component 


* Emptying and replacing fluids 


Because there are numerous opportunities for error in reinstalling and 
reconnecting components and in replacing fluids, these steps can be prone 
to error. 

Reason (1997) suggests that because of the numerous ways that compo- 
nents can be incorrectly reassembled and reconnected, or the ways that 
incorrect types or quantities of fluids can be used, in contrast to the single 
way that they can be removed or disassembled, maintenance personnel are 
more likely to commit errors when completing these tasks than when per- 
forming other maintenance tasks. Reassembling components not only calls 
for using the correct component, but also the correct attachment parts, for 
example, screws and bolts, in the correct amount, placed in the correct loca- 
tion, with the correct torque or tension. Errors can occur in any one of the 
maintenance activities. 

This maintenance step, does however, require little cognitive effort. Hobbs 
and Williamson (2002) applied Rasmussen's (1983) model of skill-, rule-, and 
knowledge-based performance to maintenance tasks and found, in a study 
of errors of aviation mechanics, that the fewest errors were considered to 
involve skill-based tasks. Such tasks, whether maintenance or other type, 
become mastered when people perform them multiple times. Over time 
the tasks can be conducted with little conscious thought, and hence with 
fewer errors. By contrast, they found that maintenance supervisors spent 
considerably more of their time overseeing the work of others, and hence 
on knowledge-based tasks, which required a higher level of cognitive effort 
than action tasks, and thus were more prone to error. 


Evaluate 


After receiving instructions, maintenance personnel inspect components 
and may observe the operation of the system to diagnose the anomaly or 
evaluate the type of maintenance actions needed. Evaluation and inspec- 
tion are subject to distinctive antecedents and, depending on the proce- 
dures, the setting, and the type of task performed, the rate of detecting flaws 
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may vary. For example, Leach and Morris (1998), who studied maintenance 
actions in an unusual environment, welds performed underwater, found 
that inspectors failed to detect defects at a rate well above the accepted 
tolerance level, with no correlation between experience of the inspector 
and error rate. Ostrom and Wilhelmsen (2008) found considerable errors 
in technicians’ abilities to detect dents in a simulated “real world” setting 
in which flashlight lighting was used with dirt and grime covering the 
area to be examined, compared to ideal conditions with ideal lighting and 
little dirt and grime. It can be seen that, as a result of the unique settings in 
which maintenance personnel search for flaws, inspection tasks are prone 
to error, regardless of the experience of the particular person conducting 
the inspection. 

Often the defects that inspectors are asked to identify are barely per- 
ceptible. Because many of the components that they inspect are flawless, 
inspectors may have little or no history of detecting flaws and, as impor- 
tant, little or no expectation of finding them. Their expectancies could 
affect their detection of flaws since previous experience helps to guide 
visual searches. Previous experience can help inspectors locate and iden- 
tify flaws, but it could also lead to potential error-inducing expectancies 
when flaws had not been encountered previously. In general, because most 
maintenance tasks are performed without error, inspectors may become 
habituated to expect satisfactory results when examining the products of 
the maintenance activities of others, thus degrading their ability to locate 
flaws. 

Components may lack indices that can assist inspectors in searching all 
the critical areas they are examining. Specialized tools have been designed 
to highlight defects in some types of inspections; however, the effective- 
ness of these tools is largely dependent on the user's expertise. Here again, 
previous experience in detecting flaws can increase an inspector's ability to 
detect subsequent flaws. Further, because these tools can place considerable 
demands on inspectors’ vigilance, over extended periods of time their visual 
searching and monitoring skills will degrade. 

Failure to detect flaws has led to accidents. An inspector failed to detect a 
flaw in a jet engine disc during a scheduled engine inspection. The flaw con- 
tinued to grow and propagate as the engine was subject to repeated operat- 
ing cycles. Nine months after an inspector had looked for and missed a flaw 
using an instrument designed to enhance the conspicuity of flaws, the part 
disintegrated as the airplane was about to take off, leading to the death of 
two passengers (National Transportation Safety Board, 1998). Investigators 
described the difficulties the inspector encountered in searching for the 
defect, 


To detect the crack on the aft-face of the hub, the inspector would have 
had to first detect a bright fluorescent green indication (if there was such 
an indication) against a dark purple background. To detect the indication, 
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the inspector would have had to systematically direct his gaze across all 
surfaces of the hub. However, systematic visual search is difficult and 
vulnerable to human error. Research on visual inspection of airframe 
components, for example, has demonstrated that inspectors miss cracks 
above the threshold for detection at times because they fail to scan an 
area of a component. It is also possible that the inspector detected an 
indication at the location of the crack but forgot to diagnose, or reinspect, 
the location. [Moreover,] a low expectation of finding a crack might also 
have decreased the inspector's vigilance. Further, research on vigilance 
suggests that performance decreases with increasing inspection time. 
(pp. 63 and 64) 





The Maintenance Environment 


Unlike operator control stations in complex systems, which are often climate 
controlled, well lit, and quiet, maintenance environments are susceptible to 
noise, temperature variations, and poor illumination, among other challeng- 
ing conditions. The maintenance environment itself is far more conducive to 
errors than are typical operational environments as they are often exposed to 
large-scale temperature variations, with considerable, intrusive background 
noise (Bosley et al., 2000). In addition, unlike many system enviornments 
where operator controls are either illuminated when needed or provided 
with sufficient background illumination for operator use, maintenance envi- 
ronments may lack proper illumination. 


Lighting 


The environment in which maintenance is conducted is often provided with 
external light that is designed to illuminate the general work environment 
and not the areas on which maintenance personnel focus. Illumination 
shortcomings can be present in confined or enclosed spaces, or in open 
spaces where general overhead lighting is the primary source of illumina- 
tion. Technicians could employ portable lighting fixtures to compensate for 
these deficiencies; however, using hand held fixtures requires technicians 
to dedicate one hand to holding the fixture, leaving only the other hand 
for the maintenance task itself. If hands-free operation is not possible, the 
technician's ability to work effectively will be impeded, either because of 
constraints from the use of only one hand, or because of the poor illumina- 
tion, leading to a maintenance or inspection error. However, the availability 
of relatively inexpensive wearable, lightweight, battery operated white LED 
(light-emitting diode) lights has mitigated, to some extent, the lighting short- 
comings of many maintenance environments. 
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Noise 


Maintenance environments can be quite noisy, many are not sound-con- 
trolled and ambient noise from ongoing activities may interfere with techni- 
cians’ work. As discussed in Chapter 4, sounds can distract operators and 
interfere with their job performance. If sufficiently loud, noise can also limit 
technicians’ ability to converse or to hear verbal instructions. Although main- 
tenance personnel can wear protective devices to limit the adverse effects of 
noise, these devices can interfere with their duties if they are uncomfortable, 
hinder conversation, or restrict movement. 


Environment 


Technicians may be exposed to wide variations in temperature and humid- 
ity because maintenance tasks are often performed outdoors or in environ- 
ments that are not climate controlled. People perform effectively at a fairly 
narrow temperature and humidity range. Although cultural and geographi- 
cal factors affect sensitivity, research has demonstrated that as temperatures 
extend beyond a fairly narrow range, from about 60°F or 15°C to about 90°F 
or 35°C (e.g., Ellis, 1982; van Orden et al., 1996; Wyon et al., 1996; Bosley et al., 
2000), ambient temperature increasingly becomes a stressor that adversely 
affects operator performance. Humidity increases sensitivity to temperature 
and exacerbates its effects. The higher the relative humidity, the narrower 
the temperature range in which people can work without being affected by 
the temperature-related stress. 

Other potentially adverse environmental factors can also degrade per- 
formance in the maintenance environment. These include pollution, such 
as from smoke, dust, or allergens and strong odors and vibrations from the 
equipment or its operating environment. 


Accessibility 


Tight quarters, exposed wires, chemicals, moving objects, protruding or 
sharp objects, the lack of protective barriers high above the ground, and 
other hazards may be present in the maintenance environment. These haz- 
ards, and technicians’ concern for and efforts directed at self-protection, 
can degrade performance. Although they can wear protective equipment in 
hazardous conditions, the protection itself may interfere with technicians’ 
mobility and thus lead to performance errors. To illustrate, heavy gloves that 
protect operators from sharp objects also restrict their dexterity. 

Features of the maintenance environment played a role in technicians’ 
errors in an incident in which one of the four engines on a Boeing 747 became 
partially detached from its pylon—the component that attaches the engine 
to the wing—upon landing (National Transportation Safety Board, 1994). 
No one was injured in the accident, but had the engine fallen off while the 
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airplane was in flight the safety of flight could have become compromised. 
Investigators found that the pin that connected the engine to the pylon was 
missing. About a week before the incident a mechanic who had performed 
maintenance on the pylon failed to reinstall the pin. Following the mechan- 
ic’s actions an inspector examined the pylon, but he failed to notice the miss- 
ing pin. These maintenance and inspection errors led to the incident. 

Investigators found that the relative inaccessibility and poor illumina- 
tion of the critical component played a part in the errors. The inspector had 
to lean about 30° to the side, from scaffolding without barriers, to inspect 
the component. The scaffolding itself was located about 30 feet or 10 meters 
above a concrete floor. As investigators report (National Transportation 
Safety Board, 1994), 


The combination of location of the scaffolding (at a level just below the 
underside of the wing that forced him [the inspector] into unusual and 
uncomfortable physical positions) and inadequate lighting from the base 
of the scaffolding up toward the pylon, hampered his inspection efforts. 
Moreover, portable fluorescent lights that had been placed along the 
floor of the scaffolding illuminated the underside of the pylon. These 
lights had previously been covered with the residue of numerous paint 
applications that diminished their brightness. (p. 30) 


Tools and Parts 


Poorly designed tools, parts, and equipment can degrade maintenance qual- 
ity and lead to error. They can be awkward to hold, difficult to use, block the 
technician’s view of the maintenance action, or present technicians with any 
of a number of difficulties. Reason and Hobbs (2003) referring to “among the 
most influential local conditions influencing work quality,” note that poorly 
designed tools and misidentified or missing parts can play critical roles in 
maintenance errors (p. 67). 

Poorly labeled or poorly designed parts may be used incorrectly or may 
be applied to the incorrect component. Many parts are relatively indistin- 
guishable, differing in seemingly imperceptible ways, yet the use of even 
“slightly” different parts can lead to an accident. In addition, since those 
who store parts do not generally perform the maintenance using those parts, 
they may not appreciate the needs of those who do. They may insufficiently 
differentiate among parts when storing them, or fail to label them clearly, 
actions that can lead technicians to select incorrect parts for the critical tasks. 


Time Pressure 


Although many tasks that operators perform in complex systems are carried 
out under some time pressure, the pressures on maintenance technicians 
may be considerable. In many systems, operations are stopped or slowed 
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at regular intervals, providing intervals for scheduled maintenance to be 
performed without adversely affecting operations. For example, some urban 
mass transit systems shut down in the late night/early morning hours, allow- 
ing maintenance to be performed without interrupting system operations, 
as would be the case if performed in the daytime. However, those systems 
resume operations at set times and maintenance activities must be completed 
within the required time frames to allow operations to resume. Technicians 
conducting maintenance in those times are aware that they must complete 
them within the allotted time. Inflexibility in the times to complete opera- 
tions can pressure personnel to hurry through tasks if they have encoun- 
tered unexpected problems or the tasks took longer than expected. 

The pressure to complete tasks can lead technicians to skips steps, or to 
conduct hurried maintenance or inspection activities, thereby increasing 
opportunities for error. This was the case, for example, in a maintenance- 
related accident that occurred when a windscreen of a BAC 1-11 aircraft blew 
out as the airplane was climbing, because the maintenance technician who 
conducted a scheduled maintenance task the night before the accident used 
bolts slightly smaller than required to retain the windscreen in place (Air 
Accidents Investigation Branch, 1992). The maintenance was performed at 
night, the shift was ending, and the maintenance crew was short-staffed. 
A supervisor conducted the maintenance himself rather than assign it to a 
subordinate in order to expedite its completion, to increase the likelihood 
that the airplane would be returned to service in the morning. However, he 
inadvertently used incorrectly sized bolts to retain the windscreen. As the 
airplane climbed, the ambient pressure outside the airplane decreased, the 
cockpit and cabin pressure remained the same, and the differential pres- 
sure on the windscreen increased to the point that it could not be retained 
in place. The error occurred from a variety of antecedents; the supervisor 
knew that the airplane needed to return the service, he was under pressure 
to obtain the parts (the bolts) needed to complete the task, he did not rec- 
ognize that he had obtained bolts that were slightly smaller than the parts 
needed for the task, and upon completion his work was not inspected by a 
different inspector. 





Case Study 


In January 2003 a Raytheon (Beechcraft) 1900D, with 19 passengers and two 
pilots onboard, crashed shortly after takeoff from Charlotte, North Carolina, 
killing all 21 onboard (National Transportation Safety Board, 2004). As the 
aircraft lifted off the runway and began its climb, its nose continued to 
pitch upward but the pilots were unable to decrease the pitch, despite their 
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continued efforts. Just over a minute after takeoff the airplane crashed into a 
maintenance hangar on the airport property. 

Two nights before the accident maintenance personnel performed sched- 
uled maintenance to check and, if necessary, readjust the tension of cables 
connecting the pilots’ control columns to the elevators, the control surfaces 
that lower or raise the aircraft tail, to increase or decrease its pitch in flight. 
The airline had contracted with a third party to perform maintenance on 
its aircraft. At the facility in which the maintenance was performed that 
contractor had then contracted with another company to provide it with the 
technicians who would actually perform the aircraft maintenance. 

The maintenance personnel who performed the critical maintenance 
actions were the quality assurance technician, who inspected the completed 
maintenance actions, the site facility manager, who represented the com- 
pany that maintained the aircraft, the maintenance foreman who oversaw 
the maintenance task assignments, the technician who worked on the air- 
plane, and the airline’s site manager, who was responsible for ensuring that 
the maintenance was carried out in accordance with the airline’s Federal 
Aviation Administration-accepted maintenance program. The airline’s rep- 
resentative, employed by the airline, and the site supervisor, employed by 
the company that maintained the aircraft, both worked at the maintenance 
facility during daytime hours, Monday through Friday. The maintenance 
personnel, who worked for the company that supplied the maintenance per- 
sonnel, worked at nights. 

Investigators determined that the actual weight of the passengers and their 
baggage exceeded that of the average passenger weight that the airline had 
used to make weight and balance determinations for their aircraft, and as a 
result the passenger and baggage weight distribution was further aft than 
it should have been, thereby making the airplane tail heavy and prone to 
pitching up. Ordinarily pilots can counter this by lowering the nose through 
the airplane’s control column, but they were unable to do so on this flight 
and the airplane’s pitch continued to increase until the airplane crashed. 
The airline, like most, had used an average passenger and baggage weight 
that the Federal Aviation Administration had approved some years before. 
However, since the weight calculations were first developed the weight of 
the average American, as well as that of others around the world, increased 
and the weights that the Federal Aviation Administration had approved for 
the average American passenger were, in effect, no longer valid. Further, the 
average passenger gender distribution was based on a mix of 60% male and 
40% female passengers whereas on the accident flight 16 males and three 
female passengers were onboard. After the accident, at the request of the 
Federal Aviation Administration, several airlines tallied the actual weights 
of their passengers over a 3-day period. They found that the average pas- 
senger weight that airlines had been using was more than 10% below than 
that used in the Federal Aviation Administration approved average weights, 
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and the baggage weights were found to have exceeded the average baggage 
weights by about a third. 

The aircraft’s control column’s forward deflection was found to have been 
limited to about 7° rather than the 14-15" specified in the airline's mainte- 
nance manual. Investigators learned that a downward deflection of about 
9° was needed to safely fly the airplane with the aft center of gravity load- 
ing it had at the time of the accident. The aft weight distribution that led to 
the airplane’s pitch up after takeoff should not, by itself, have prevented the 
pilots from controlling the airplane and the pilots’ words heard on the cock- 
pit voice recorder indicated that they were actively trying to lower the nose. 

Investigators found multiple errors and error antecedents in the mainte- 
nance conducted two nights before the flight, which, together with the aft 
center of gravity, caused the forward control column limitation that led to 
the accident. The maintenance technician who worked on the cable tension 
had omitted several steps in adjusting the cable tension. In particular, he 
did not calibrate the tension after he had readjusted it. Further, the quality 
assurance inspector was aware of and approved the technician’s omission 
of the step; neither believed that all the steps called for to adjust the tension 
were necessary, despite the steps being explicitly listed in the maintenance 
manual, and therefore required. Had the required maintenance actions been 
carried out, the actual column deflection would have been conducted, the 
cable tension calibrated, and the incorrect cable tension recognized. 

The inspector had been employed by the maintenance contractor about 6 
months before the accident, and the maintenance technician, employed by 
the company that provided maintenance personnel to the maintenance con- 
tractor, about 2 months before the accident. In this respect, their tenure with 
that company was generally consistent with that of their colleagues, the aver- 
age length of service being about 3 months. Moreover, investigators found 
that the absence of either the airline representative or the maintenance con- 
tractor's site facility manager at night, when much of the critical maintenance 
was performed, resulted in the absence of someone familiar with company 
maintenance procedures, a person who could have informed them of the 
need to follow all the required steps of the maintenance procedures when a 
technician had sought to omit required maintenance actions. Their absence 
also resulted in their not observing much of the maintenance that was con- 
ducted on company aircraft, and not noticing the extent to which steps listed 
in the maintenance manual were not followed. 

Investigators also learned that the technician who performed the cable 
maintenance had not performed that action on the model of airplane 
involved in the accident, although he had done so on a different type of air- 
craft. Training on maintenance tasks at the facility was conducted through 
on the job training or OJT. The quality assurance inspector, who provided the 
OJT to the maintenance technician on this task, told investigators that he did 
not believe that he needed to closely supervise the technician because of the 
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technician’s previous experience of that task, even though it was on a type 
of aircraft that had slight resemblance to the airplane under maintenance. 
Investigators also found that the written instructions regarding the cable 
adjustment procedures were located in two separate documents. One step 
was documented in a second manual, thereby calling for technicians, during 
the procedure, to access a second document in addition to the one guiding 
them in the tasks. By forcing the technicians, while following the mainte- 
nance steps, to stop their work, consult a second manual, follow that step 
and then return to the steps in the other manual, an additional error ante- 
cedent was introduced into the process. Finally, after the maintenance was 
conducted, its quality was assessed by the quality assurance inspector, the 
person who had approved omitting the critical maintenance steps. Although 
applicable regulations did not prohibit inspectors from overseeing or training 
personnel performing maintenance tasks and then inspecting the completion 
of those tasks, investigators criticized the maintenance contractor’s doing so, 
and the airline’s representative for not recognizing and prohibiting this prac- 
tice. As investigators wrote (National Transportation Safety Board, 2004), 


The inspectors cannot properly fulfill their RII [required inspection 
item-tasks that require separate inspection] responsibilities in such a 
situation. The purpose of an RII inspection is to provide “a second set of 
eyes” to ensure that any error made in performing maintenance work is 
detected and corrected before an airplane is returned to service. (p. 107) 


Thus, a simple maintenance error, incorrectly adjusting cable tension, led 
to misrigged cables that prevented the pilots from recovering from an oth- 
erwise correctable airplane pitch up. Because the passenger and baggage 
weights and distribution were inaccurate, the airplane was loaded tail heavy, 
causing the pitch up. The error in rigging the cables was committed by an 
ill-trained technician who had little experience with the airplane type or 
with the airline for whom he was performing maintenance actions. He was 
employed neither by the airline nor by the maintenance third party the air- 
line had retained to conduct its maintenance but by a contractor that pro- 
vided maintenance personnel to the maintenance third party. Cumbersome 
written instructions describing the maintenance actions may have played a 
role in the mechanic’s omission and the inspector’s approval of the omission 
and subsequent failure to recognize its consequences, the misrigged cables. 





Summary 


Maintenance largely involves three critical acts, recognizing and under- 
standing the maintenance to be performed, carrying out the maintenance, 
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and evaluating or inspecting the effectiveness of the action taken. Errors 
can take place even before the maintenance task itself is performed, in the 
instructions given to or received by the maintenance specialist, as well as 
because of aspects of the maintenance environment, the task itself, or the 
tools and parts used. 

The maintenance environment differs substantially from those of most 
system control stations in complex systems. The environment may be subject 
to temperature extremes, poor illumination, distracting ambient noise, and 
difficult to access components. 

Flaws or defects that inspectors examine may be relatively inconspicu- 
ous and difficult to detect. Inspectors may use devices to enhance flaw and 
defect conspicuity, but after extended visual inspection their effectiveness 
in detecting flaws will deteriorate. Previous experience detecting flaws will 
affect inspectors’ expectations and the likelihood of finding flaws in subse- 
quent inspections. 


DOCUMENTING MAINTENANCE ANTECEDENTS 


MAINTENANCE AND INSPECTION 


e If there is reason to believe that it was uncomfortably hot or 
cold at the time the maintenance was performed, measure the 
temperature and humidity several times a day over a period of 
several days, as close as possible to the time of the accident, and 
use the average temperature and humidity if the information 
is not available. 


* Identify company deadlines to complete the required tasks. 


e Evaluate the nature of written or verbal maintenance instruc- 
tions, information on equipment, tools, and parts used, dif- 
ficulties in their use, company training, and other relevant 
information by interviewing those who performed the critical 
maintenance, or were performing maintenance nearby at the 
time. 


* Measure the illumination of the critical component and docu- 
ment the source of the light, its power, preferably in foot or 
meter candles, the area being illuminated, and the location of 
blind spots, dark areas, or other deficiencies. 


e Measure the ambient sound level at a time close to the time of 
the event if sound recordings at the time of the accident are not 
available. 


* Obtain data concerning the work of the maintenance personnel 
at the time of the event from security cameras, audio recorders, 
and other devices. 
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* Determine accessibility to the component of interest from the 
location at which the maintenance technician performed the 
maintenance or inspection. 


* Document material or components that blocked access to a com- 
ponent, and any exposed wiring, hazardous chemicals, or other 
potentially dangerous material in and around the work area. 


* Determine the number of tasks the technician was scheduled 
to perform around the time the critical maintenance was car- 
ried out. 


e Assess the time remaining between the completion of the 
required maintenance action and the end of the shift and the 
time when the component was to be reentered into service. 


MAINTENANCE 


* Measure differences among the dimensions of tools and parts 
used in the maintenance task and those of tools and parts 
approved for the task. 


e Determine the availability of documentation on the correct 
tools or parts to be used and impediments to accessing the 
information. 


* Examine the accessibility of the correct parts, and document 
the type and availability of information available to operators 
that described the parts to be used. 


* Note the distance between the location of the parts and the 
location of the component that was maintained. 


INSPECTION 


e Determine the amount of time the inspector devoted to 
inspection. 

* Note the number of inspections of the particular component the 
inspector had carried out previously, and the number of times 
the inspector found flaws, defects, or incomplete maintenance. 


* Measure the size of the flaw if possible, and document features 
that distinguish a flawless component from a flawed one, or a 
completed maintenance task from an incomplete one. 


THE TECHNICIAN 
* Obtain medical, financial, and personal information, as appro- 
priate, to document potential operator antecedents that could 
pertain to the technicians. 
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* Document the technician's visual acuity, particularly for near 
vision. Ask the maintenance technician to obtain a visual 
examination, if possible, if a year or more has elapsed since the 
inspector's vision was last assessed. 


* Determine whether the operator wore corrective lenses at the 
time of the event, and the extent to which the lenses corrected 
for any visual impairment. 


ĀE oo ooo 


References 


Air Accidents Investigation Branch. 1992. Report on the accident to Boeing 737-400, 
G-OBME, near Kegworth, Leicestershire, on 8 January, 1989. Aircraft Accident 
Report No. 4/90 (EW/C1095). London: Department of Transport. 

Antonovsky, A., Pollock, C., and Straker, L. 2014. Identification of the human factors 
contributing to maintenance failures in a petroleum operation. Human Factors, 
56, 306-321. 

Bosley, G. C., Miller, R. M., and Watson, J. 2000. Evaluation of aviation maintenance 
working environments, fatigue and maintenance errors/accidents. Federal Aviation 
Administration, Office of Aviation Medicine, Washington, DC. 

Drury, C. G. 1998. Human factors in aviation maintenance. In D. J. Garland, ]. A. 
Wise, and V. D. Hopkin (Eds.), Handbook of aviation human factors (pp. 591-606). 
Mahwah, NJ: Erlbaum. 

Drury, C. G., Guy, K. P., and Wenner, C. A. 2008. Outsourcing aviation maintenance: 
Human factors implications, specifically for communications. International 
Journal of Aviation Psychology, 20, 124-143. 

Ellis, H. D. 1982. The effects of cold on the performance of serial choice reaction time 
and various discrete tasks. Human Factors, 24, 589-598. 

Hobbs, A. and Kanki, B. G. 2008. Patterns of error in confidential maintenance inci- 
dent reports. International Journal of Aviation Psychology, 18, 5-16. 

Hobbs, A. and Williamson, A. 2002. Skills, rules and knowledge in aircraft mainte- 
nance: Errors in context. Ergonomics, 45, 290-308. 

Hobbs, A. and Williamson, A. 2003. Associations between errors and contributing 
factors in aircraft maintenance. Human Factors, 45, 186-201. 

Leach, J. and Morris, P. E. 1998. Cognitive factors in the close visual and magnetic 
particle inspections of welds underwater. Human Factors, 40, 187-197. 

Munro, P. A., Kanki, B. G., and Jordan, K. 2008. Beyond “inop”: Logbook commu- 
nication between airline pilots and mechanics. International Journal of Aviation 
Psychology, 18, 86-103. 

National Transportation Safety Board. 1994. Special investigation report, maintenance 
anomaly resulting in dragged engine during landing rollout, Northwest Airlines flight 
18, Boeing 747-251b, N637US, New Tokyo International Airport, Narita, Japan, March 
1, 1994. Report Number: SIR-94-02. Washington, DC: National Transportation 
Safety Board. 


236 Investigating Human Error 


National Transportation Safety Board. 1997. Aircraft accident report, in-flight fire and 
impact with terrain, ValuJet Airlines, flight 592, DC-9-32, N904V], Everglades, Near 
Miami, Florida, May 11, 1996. Report Number: AAR-97-06. Washington, DC: 
National Transportation Safety Board. 

National Transportation Safety Board. 1998. Aircraft accident report, uncontained engine 
failure, Delta Airlines flight 1288, McDonnell Douglas MD-88, N927DA, Pensacola, 
Florida, July 6, 1996. Report Number: AAR-98-01. Washington, DC: National 
Transportation Safety Board. 

National Transportation Safety Board. 2004. Aircraft accident report, loss of pitch 
control during takeoff, Air Midwest Flight 5481, Raytheon (Beechcraft) 1900D, 
N233YV, Charlotte, North Carolina, January 28, 2003. Report Number AAR-04-01. 
Washington, DC: National Transportation Safety Board. 

Ostrom, L. T. and Wilhelmsen, C. A. 2008. Developing risk models for aviation main- 
tenance and inspection. International Journal of Aviation Psychology, 18, 30-42. 

Rankin, W., Hibit, R., Allen, J., and Sargent, R. 2000. Development and evaluation of 
the Maintenance Error Decision Aid (MEDA) process. International Journal of 
Industrial Ergonomics, 26, 261-276. 

Rasmussen, J. 1983. Skill, rules, and knowledge; Signals, signs and symbols, and 
other distinctions in human performance models. IEEE Transactions on Systems, 
Man and Cybernetics, 13, 257-266. 

Reason, J. T. 1997. Managing the risks of organizational accidents. Aldershot, England: 
Ashgate. 

Reason, J. T. and Hobbs, A. 2003. Managing maintenance error. Aldershot, England: 
Ashgate. 

van Orden, K. F., Benoit, S. L., and Osga, G. A. 1996. Effects of cold air stress on the 
performance of a command and control task. Human Factors, 38, 130-141. 

Ward, M., McDonald, N., Morrison, R., Gaynor, D., and Nugent, T. 2010. A perfor- 
mance improvement case study in aircraft maintenance and its implications for 
hazard identification. Ergonomics, 53, 247-267. 

Wyon, D. P., Wyon, L, and Norin, F. 1996. Effects of moderate heat stress on driver 
vigilance in a moving vehicle. Ergonomics, 39, 61-75. 


14 


Situation Awareness and Decision Making 











The impetus of existing plans is always stronger than the impulse to 





change. 
Tuchman, 1962 
The Guns of August 
=e 
Introduction 


Throughout history poor decision making has led to undesired outcomes. In 
the first days of World War I, for example, the military commanders, and the 
heads of state and government of several countries made decisions that, that 
in hindsight led to disastrous consequences (Tuchman, 1962; Clark, 2012). 
In the face of overwhelming evidence that their initial plans needed to be 
revised in the light of battlefield conditions, they refused to do so, decisions 
that ultimately led to their countries’ defeat. 

In complex systems decision-making errors have led to accidents. These 
errors often result from deficiencies in operator situation awareness. Because 
of the importance of situation awareness to decision making, and the impor- 
tance of decision making to system safety, investigators need to understand 
both to effectively investigate error. In particular, factors affecting situation 
awareness, the relationship of situation awareness to operator decision mak- 
ing, and the effects of deficient decision making on operator errors, need to 
be examined to properly understand decision making and decision-making 
errors. This chapter will review situation awareness and decision making, 
and discuss the types of data needed to assess decision making when con- 
ducting error investigations. 
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Situation Assessment and Situation Awareness 


Situation awareness has, in recent years, become a widely used term in 
complex systems. Manufacturers have promoted equipment by its ability 
to enhance situation awareness, and managers have communicated infor- 
mation for the sake of their operators’ situation awareness. Byrne (2015) 
attributes this wide use of the term to its ease of being understood. As he 
describes (2015) 


SA [Situation Awareness] is an intuitive and intrinsically satisfying con- 
struct that is applicable to both understanding and enhancing human 
performance in applied environments. SA also serves as a great orient- 
ing term to aid in communication about human performance. That is, 
when used in either conversation or writing, it immediately alerts the 
recipient to the general area of discussion forthcoming. Whether refer- 
enced by an operator learning ways to optimize individual performance 
or by a human performance analyst providing insights into what factors 
may have led to breakdowns in performance, use of the term SA helps to 
focus discussion. (p. 85) 


However, the term has also become misused. van Winsen and Dekker 
(2015) argue that situation awareness has become so overused that it has 
taken on a meaning away from its initial derivation. As they write (van 
Winsen and Dekker, 2015) 


The wide use of SA has not only reinforced its status—the more it was 
used, the greater the consensus authority on using it—but also driven 
a gradual move away from its original purpose. It has offered new 
normative standards for behavior: Pilots now describe themselves as 
good pilots when they are situationally aware, when their decisions are 
informed by “good SA.” Soon after its conception, SA started to appear 
in accident investigation reports, where it was given great causal power 
to explain accidents... (p. 53) 


van Winsen and Dekker’s observation that situation awareness has 
appeared in accident investigation reports is accurate. However, without 
describing either how situation awareness was lost or the aspects of situ- 
ation awareness that were lost or never attained, merely by attributing an 
error to a loss of situation awareness has not enhanced the understanding 
of error or explained the actual nature of the critical error or errors. The 
loss of situation awareness or the failure to gain situation awareness is a 
critical factor in errors that can lead to accidents, but without describing the 
antecedents to the loss or absence of situation awareness, or describing how 
it was lost or not attained, the investigator has not explained how the error 
came about. 
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To be able to fully describe the nature of the absence or loss of operator 
situation awareness, investigators must understand the concept and its appli- 
cation to complex systems. Situation awareness is, first and foremost, the 
product of situation assessment (Endsley, 2015), while situation assessment is 
the process of acquiring data or information to enable one to understand or 
obtain a mental picture of the immediate environment. Situation awareness 
is the product of situation assessment, a person's understanding; or mental 
picture of his or her immediate environment. Situation assessment and situ- 
ation awareness are closely related; at any point in time the two are identical. 

Endsley (1995) lists three elements that form situation awareness: (1) per- 
ceiving the status, attributes, and dynamics of relevant elements in the envi- 
ronment, referred to as Level 1 situation awareness, (2) comprehending the 
significance of these elements, Level 2 situation awareness, and (3) project- 
ing current assessment to future status, or Level 3 situation awareness. An 
operator would obtain situation awareness after receiving critical system- 
related information, understanding it, and using the information to predict 
the near term system state. Endsley (1995, 2000) argues that situation aware- 
ness is based upon elements of both operators and equipment, suggestive of 
operator and equipment antecedents discussed previously. As we will see, 
several factors can lead to the inability to obtain situation awareness or to 
lose it. Perhaps most critical for our purposes is the inability to predict the 
near-term operational environment. It is not enough to know what is going 
on in the environment in which an operator is working, he or she must also 
be able to accurately predict the near-term system state as well. 

For our purposes, applying Endsley’s concept of situation awareness to 
complex systems, situation awareness refers to the operator’s understanding 
of what the system is doing, and what it will be doing in the short run. If the 
operator is an airline pilot, with situation awareness he or she should know 
where the airplane is, the type of weather it is encountering, and where the 
airplane will be and what type of weather it will be encountering in the near 
term. Pilots who don’t know where they are, or who don’t know how close 
they are to terrain, have either lost situation awareness with regard to their 
position or have failed to obtain it. As can be seen, because the environments 
in which complex systems operate can be considerable, situation awareness 
should be described in terms of the particular aspect of the environment to 
which one is referring. 

Unless an operator is unskilled, situation awareness is rarely lost when 
operations are routine. However, when operations are disrupted, something 
unexpected arises, or routine operations are changed, they may face occasions 
where they have to quickly interpret the change, diagnose the problem, or 
recalculate the effects of the change on near-term operations. In circumstances 
such as these, operators, even skilled ones, can lose situation awareness. 

To understand situation awareness, it is necessary to understand factors 
that influence each of the levels of situation awareness described. One relates 
to temporal factors, that is, the amount of time the operator has available to 
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understand the situation being encountered The more time an operator has 
available to assess a situation, the greater the likelihood that he or she will 
obtain situation awareness about that situation. Conversely, as the amount of 
time decreases, the less likely the operator will be to obtain situation aware- 
ness. Other aspects of situation awareness are influenced by operator- and 
equipment-related factors. 


Situation Awareness: Operator Elements 
Other aspects that affect situation awareness include an operator's 


* Expertise 
* Expectancies 


Workload and attention 


Automaticity 
Goals 


Each can affect an operator's situation awareness and influence operator 
performance. 

Expertise. As operators gain system experience they become increasingly 
expert at interpreting system cues and as a result, they need less time and 
fewer cues to obtain situation awareness. Researchers have argued that 
speed in recognizing the circumstances they are encountering distinguishes 
experts from novices. Previous system-related experience, as Orasanu (1993) 
notes, allows experienced operators to recognize or “see” the underlying 
structures of problems more quickly than can novices. Day and Goldstone 
(2012), for example, suggested that a critical element of experts’ advantage 
over novices is in their speed of recognizing circumstances. Durso and 
Dattel (2006) described expertise as particularly useful in potentially haz- 
ardous situations, where experts are superior to novices in hazard percep- 
tion and recognition. Memory also contributes to expertise as the better the 
operators’ memory, the more experiences they can call upon to compare the 
current situation to ones they encountered previously. 

Experts and novices also differ in response to dynamic situations. Federico 
(1995) compared expert and novice military personnel who examined identi- 
cal simulated battlefield scenarios, and found that experts were considerably 
more context-dependent in evaluating situations than were novices, allowing 
them to evaluate situations more completely than novices. Experts were also 
found to multi-task better than novices. Cara and LaGrange (1999) determined 
that experienced nuclear power plant controllers anticipate events while 
they exercise system control, enabling them to quickly discern subtle system 
interactions. Endsley (2006) said that novice operators lack the knowledge 
to differentiate between information important to a particular situation and 
information that is not. “Without knowledge of the underlying relationships 
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among system components,” she writes (p. 638), “novices do not realize what 
information to seek out following receipt of other information.” Given these 
differences, in comparable circumstances experts can be expected to gain sit- 
uation awareness more quickly and with fewer cues than can novices. 

Training can compensate, in part, for inexperience by providing nov- 
ice operators with system-related knowledge that experts have acquired 
(Tenney and Pew, 2006). Training can also allow operators to practice recog- 
nizing and responding to simulated unexpected or emergency system states, 
thus enabling them to respond appropriately should they encounter similar 
circumstances in actual operations. 

Expectancies. Operators’ mental models of system state guide them to the 
relevant situation cues, increasing the efficiency of their situation assess- 
ment. However, their mental models can also lead them to expect cues that 
may not be present in the actual environment. Jones (1997) and Jones and 
Endsley (2000), found that if operators’ expectancies did not match the cues 
they encountered, whether because of their own incorrect mental models 
or because circumstances had changed after they had formed their mental 
models, they often failed to perceive cues critical to situation awareness, and 
hence they retained inaccurate situation awareness. 

Unmet expectancies can also lead operators to misinterpret the cues they 
perceive, what Jones (1997) has termed “representational errors.” Unfortu- 
nately, as Endsley (2000) notes, initial situation assessments are particularly 
resistant to change after operators have received conflicting information. An 
operator with deficient initial situation awareness may have difficulty obtain- 
ing more accurate situation awareness subsequently. Should the situation 
change subtly, the operator may not recognize situational changes. 

Workload and Attention. Workload affects an operator’s ability to attend to 
and interpret necessary cues, and thus it can directly affect situation aware- 
ness. In high workload conditions, which often occur in unexpected or 
nonroutine situations, operators might work so intensely to diagnose and 
understand the situation they are encountering that they have limited spare 
cognitive capacity to attend to multiple cues. In these circumstances they 
will attend to the most salient cues available, cues that may not necessar- 
ily be the most informative. On the other hand in low workload conditions 
operators may reduce their vigilance to the point that they attend to cues 
ineffectively, or fail to seek out the cues necessary for situation awareness. 

Automaticity. Operators can compensate for the effects of high workload 
by using “automaticity” when completing tasks they had performed often. 
With repetition a task can become so familiar that it can be performed with 
little conscious effort. Many automobile operators, for example, who repeat- 
edly drive the same route can become so adept at a task that they devote little 
attention to it, and attend to only limited situational cues (Logan, 1988), as if 
they are going through the motions without really thinking about what they 
are doing. However, in the event that they encounter new or unexpected 
cues, operators may pay a price for automaticity because they may fail to 
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notice changed cues and have difficulty modifying their situation awareness 
in response (Adams, Tenney, and Pew, 1995). 

Goals. Goals help to guide operators to the information needed for situa- 
tion awareness. A familiar aural alert, for example, guides operators to the 
information needed to comprehend the circumstances that led the alert. The 
alert acts as a goal, orienting operators to the information they need to obtain 
and maintain situation awareness. 


Situation Awareness: Equipment Elements 


In most systems operators depend largely on displays of visual informa- 
tion to recognize system state. Particularly if the system state changes in 
unexpected ways, operators need to quickly recognize what the system is 
doing, why it is doing that, and what the system state will be in the near 
term. Poorly presented displays can adversely affect an operator’s ability to 
recognize and understand the cause of a change to system state. However, 
with sufficient experience, an operator may recognize the displays necessary 
to provide the critical information, something that inexperienced operators 
may be unable to do. Consequently, the impact of poorly presented infor- 
mation may not be equal across the board, but may affect experienced and 
inexperienced operators differently. Chapter 4 addressed many aspects of 
the presentation of system-related information and their effects on operator 
performance. The effects of two equipment features on operator situation 
awareness will be addressed presently, (1) cue salience and interpretability 
and (2) automation level. 

Cue Salience and Interpretability. Displays that present information poorly 
require operators to expend more effort interpreting the data than do well- 
designed displays. Equipment designers have generally taken this into 
account when designing new systems. They have replaced analog displays 
that have one-to-one relationships to system components, in which displays 
and aural alerts can closely match operators’ informational needs, with digi- 
tal displays that have more flexibility in presenting critical information to 
operators in more interpretable ways than had analog displays. Advanced 
digital displays can also present information corresponding in salience to 
the urgency of the needed response, and, if necessary, guide operators to the 
desired responses. 

In the discussion in Chapter 10 of the Boeing 737 that crashed in 
Washington, DC (National Transportation Safety Board, 1982), the pilots 
had approximately 30 seconds to understand what caused the engine instru- 
ments to display engine takeoff information in the manner presented, a 
presentation that the pilots had likely never encountered. This amount of 
time between takeoff and the decision point to safely continue or reject 
the takeoff was insufficient to enable them to interpret the data from the 
10 engine-related gauges. The pilots primarily attended to two gauges that 
presented the most salient information, those that they had typically relied 
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upon within the group of 10 gauges. Unfortunately, those gauges presented 
inaccurate information. The other eight gauges presented the less salient, but 
accurate information. 

The importance of cue salience is especially evident with respect to aural 
information. When multiple aural alerts are sounded, operators attend to the 
loudest or most prominent, especially when they are under stress or experi- 
encing high workload. For this reason, designers generally have those alerts 
that are associated with the most critical system states the most prominent 
of the alerts. 

Automation level. High levels of automation remove operators from direct 
involvement in system operations and alter the system-related information 
they receive, creating what has been referred to as “out of the loop perfor- 
mance” (Endsley and Kiris, 1995). Out-of-the-loop performance takes place 
when automated systems perform many of the system monitoring, tasks 
operators had performed themselves, leaving them less attentive to the sys- 
tems than they would otherwise have been, thus potentially reducing their 
situation awareness. This can be most critical during those operating phases 
when situation awareness is most needed. Because of the importance of auto- 
mation to understanding operator error, this topic will be discussed more 
fully in Chapter 15. 





Obtaining or Losing Situation Awareness 
Obtaining Situation Awareness 


Research has increased our knowledge of how operators obtain situation 
awareness. Mumaw, Roth, Vicente, and Burns (2000) observed nuclear power 
plant controllers and found that they gained situation awareness from a vari- 
ety of sources, not exclusively from system displays and alerts, as had been 
thought. In addition to obtaining information from the displays, the opera- 
tors actively sought out information from the operating environment. For 
example, as their shifts changed, departing operators briefed incoming ones 
about system-related events, and incoming operators probed the outgoing 
ones to obtain information. They also used control room logs and interacted 
with other operators, both from their own teams and those outside their 
immediate control room operating environments. Finally, because of the size 
of the control room in nuclear reactors was so large, operators would often 
walk through the facility to observe operations to obtain situation awareness. 

Operators use a variety of techniques and strategies to obtain situation 
awareness. They rely extensively on displays for their information, they talk 
to other operators, they listen to changes in system-related sounds, and they 
also actively solicit and obtain information from other sources. As Mumaw 
et al. (2000) write, 


244 Investigating Human Error 


We emphasize the contribution of the various informal strategies and 
competencies that operators have developed to carry out monitoring 
effectively. Although these strategies are not part of the formal train- 
ing programs or the official operating procedures, they are extremely 
important because they facilitate the complex demands of monitoring 
and compensate for poor interface design decisions. Thus one could 
effectively argue that the system works well, not despite but because of 
operators’ deviations from formal practices. (p. 52) 


Losing Situation Awareness 


Limited operator exposure to a situation, inadequate training, poorly pre- 
sented system information, and high workload, with other factors, can indi- 
vidually or in combination, adversely affect situation awareness. Other factors, 
such as automaticity, can also affect operator ability to deal with high work- 
load situations, and limit their ability to perceive novel or unexpected cues. 

Based on their training and experience with the system and on the avail- 
able information that they obtain, operators develop mental models of sys- 
tem state, which serve as the basis for their situation awareness. However, 
they may lose situation awareness, or their mental models of the system 
states may no longer be accurate, for any number of reasons. Adams et al. 
(1995) studied the activities of airline pilots and found that pilots may fail to 
perceive cues that are subtly different from, but seemingly similar to, those 
associated with their mental models of the system state, especially if they 
are engaged in cognitively demanding tasks (see also Jones, 1997). Rather, as 
Jones and Endsley (2000) learned, operators were more likely to notice and 
alter their situation awareness when exposed to cues that were considerably 
different from those associated with their mental images. Therefore, the less 
cognitive effort operators expend in monitoring the system, the more likely 
they will fail to attend to critical situational cues. 

Nevertheless, excessive cognitive effort may also lead to deficient situation 
awareness. Adams et al. (1995) suggest that when presented with ambigu- 
ous or incomplete information, operators may expend considerable cogni- 
tive effort to interpret the information. Their efforts can be so extensive as 
to distort, diminish, or even block their ability to perceive and comprehend 
arriving information. 

Operators can also lose situation awareness when they are interrupted 
while performing a task that may require a number of steps to complete, as 
discussed in Chapter 10 with regard to the 1987 MD-80 accident in Detroit, 
Michigan. After the pilots had been interrupted during a critical checklist 
review they did not recognize that they had skipped an item in their check- 
list and consequently did not extend the flaps and slats before takeoff after 
(National Transportation Safety Board, 1988), thereby limiting the aircraft’s 
aerodynamic capability that was necessary for takeoff. Yet, during periods 
of high workload operators will almost certainly face competing demands 
on their attention, and can often be interrupted during their activities. When 
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returning to their tasks their ability to maintain the situation awareness that 
they had acquired before the disruption will be reduced. As Adams et al. 
(1995) write with regard to airline pilots, 


To the extent that incoming information is unrelated to the task in which 
the pilot is concurrently engaged, its interpretation must involve con- 
siderable mental workload and risk. The more time and effort the pilot 
invests in its interpretation, the greater the potential for blocking rein- 
statement of the interrupted task as well as proper interpretation of other 
available data. The less time and effort invested in its interpretation, the 
greater the likelihood of misconstruing its implications. In a nutshell, 
choosing to focus attention on one set of events can be achieved only at 
the cost of diverting attention from all others. (p. 96) 


In summary, high workload, competing task demands, and ambiguous 
cues can all contribute to an operator’s loss of situation awareness, even with 
experienced and well-trained operators. 





Decision Making 


Accurate situation awareness is necessary for decision making. Operators 
with deficient or inaccurate situation awareness have difficulty interpreting 
system-related information and are likely to commit errors. Because of the 
influence of decision making on operator performance, and the critical role 
of decision errors in system safety, investigators should be familiar with the 
process and its role in error. 

Operators generally apply one of two types of decision-making processes 
to the circumstances they encounter. One is appropriate to fairly static envi- 
ronments and the other to the dynamic environments of complex systems. 


Classical Decision Making 


In relatively static environments people usually employ “classical” decision- 
making processes, in which they, 


1. Assess the situation 

2. Identify the available options 

3. Determine the costs and benefits (relative value) of each 
4. Select the option with the lowest costs and highest benefit 


Classical decision-making scenarios generally allow decision makers suffi- 
cient time to effectively assess the situation, identify and evaluate the various 
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options, and select the option with the greatest benefit and least perceived 
cost. Decision makers may value the benefits of a particular course of action 
to be greater than the value of an alternative one, the costs of an alternative 
path as greater than the costs of the selected path, or both (Strauch, 2016). 

Decision makers generally complete these steps when there is sufficient 
time available, such as when making a major purchase, considering a job 
offer, selecting a candidate for a position, or even choosing the movie they will 
watch. For example, an automobile owner whose car has required multiple 
repairs, may, at some point, ask himself or herself whether to continue pay- 
ing for repairs or buy a replacement vehicle instead. Evaluating the options, 
continue repairs or buy a replacement vehicle, depends on the owners toler- 
ance for unreliability, and for paying for the cost of replacement vehicle, as 
well as the cost of driving an older vehicle, as older ones typically require 
more maintenance than newer ones, and other factors that are specific to 
individuals. Ultimately, the owner's decision may well reflect nothing more 
than the availability of spare cash, but most people will make the decision on 
personal or family valuation of the costs and benefits of each alternative. The 
costs of making a bad decision in this and most classical decision-making 
scenarios are financial. Bad decisions cost the decision maker more money, 
either in the short- or long-term, than do good decisions. 


Naturalistic Decision Making 


In the often dynamic environments in which complex systems function, 
operators may not have sufficient time available to enable them to fully iden- 
tify and value the available alternatives. Cues may be ambiguous, conflict- 
ing, and changing, options may not be fully identifiable, and the values of 
the operator may be irrelevant to what he or she thinks can best serve the 
system needs. As Klein (1993a) explains, 


Most systems must be operated under time pressure. Many systems 
must be operated with ill-defined goals...[and] shifting goals [that] refer 
to the fact that dynamic conditions may change what is important. Data 
problems are often inescapable. Decisions are made within the context of 
larger companies [that have their own priorities]. Tasks generally involve 
some amount of teamwork and coordination among different opera- 
tors. Contextual factors such as acute stressors can come into play [such 
as] time pressure and uncertainty about data...Operators can’t follow 
carefully defined procedures [and] finally, the decisions...involve high 
stakes, often [with] risk to lives and property. (pp. 16-19) 


Rather than identify and then compare and select among alternatives, 
Klein (1993b) suggests that decision makers in dynamic situations employ 
what is referred to as recognition-primed decision making in which they rec- 
ognize a situation based on their experiences and select the course of action 
appropriate to their perception of the situation. 
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Orasanu (1993) argues that naturalistic decision making, which takes place 
in the “naturalistic” environments of complex systems, is quantitatively and 
qualitatively different from the processes employed in classical situations. In 
dynamic situations naturalistic decision making is faster than classical deci- 
sion making because decision makers bypass steps critical to effective deci- 
sion making—identifying and evaluating alternatives. The process can be 
effective when applied to dynamic, rapidly changing circumstances. Further, 
unlike the static settings in which people employ classical decision making, 
in the dynamic settings in which complex system operations make deci- 
sions, circumstances may change rapidly, the circumstances may be ambig- 
uous, and decision makers may encounter competing goals (Orasanu and 
Connolly, 1993). Moreover, unlike in classical decision-making scenarios, 
errors made in naturalistic settings can be personally risky to decision mak- 
ers, posing consequences that are well beyond the potential financial costs of 
errors in classical decision-making milieus. 

The naturalistic decision-making process may not necessarily lead to the 
“best” decision for the circumstances, but it will likely be good enough for 
the particular situation, a process also known as “satisficing” (e.g., Federico, 
1995). As Orasanu (1993) notes, “A decision strategy that is ‘good’ enough, 
though not optimal, and is low in [the cognitive] cost [required to obtain the 
‘best’ decision] may be more desirable than a very costly, and perhaps only 
marginally better, decision” (p. 151). 

When the situation is highly dynamic, the decision maker experienced, 
and the available time brief, a naturalistic decision should lead to a more 
effective decision than one reached through the classical decision process. 
Because the decision will be reached quickly, it can be made with a reason- 
able likelihood of success, provided that the decision maker's initial assess- 
ment was accurate. However, should decision makers inaccurately assess the 
situation, naturalistic decision making can lead to poor decisions. In highly 
dynamic conditions, operators facing time pressure or stress will likely 
attend to the most salient cues and not necessarily the most informative ones. 
In the absence of a thorough situation assessment, operators can misperceive 
a situation and make ineffective decisions because the foundations of their 
decisions will have been flawed. As occurred in the 1982 Boeing 737 accident 
in Washington, DC (National Transportation Safety Board, 1982), the captain 
responded to cues—airspeed and engine thrust displays—that were inac- 
curate. His situation awareness was faulty and therefore his decision to take 
off was deficient, ultimately leading to the accident. 


Heuristics and Biases 


Biases in decision making refer not to dislikes of other people, but to 
decision-making processes that are “biased” in particular methods or out- 
comes. Such biases affect operator decision making in unexpected ways, 
often counter to what would be predicted exclusively by decision-making 
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models. Researchers suggest that biases influence decision making for very 
practical reasons. As Tversky and Kahneman (1974) suggest, in ambiguous 
situations “...people rely on a limited number of heuristic principles which 
reduce the complex tasks of assessing probabilities and predicting values to 
simpler judgmental operations” (p. 1124). 

Wickens and Hollands (2000) describe a bias that many decision makers 
demonstrate after they have made a decision. As noted at the beginning of 
this chapter, people are often reluctant to alter decisions they have made, even 
in the face of evidence suggesting that their decisions and situation assess- 
ments were faulty. As in other domains, in a complex system reluctance to 
alter a decision in the face of contrary evidence can lead to error. Orasanu, 
Martin, and Davison (1998), following up on an National Transportation 
Safety Board study on pilot error accidents (National Transportation Safety 
Board, 1994), attribute many of the decision-making errors that they exam- 
ined to ”...errors in which the crew decided to continue with the original 
plan of action in the face of cues that suggested changing the course of 
action” (p. 5), which they called “plan continuation errors.” 

Wickens and Hollands (2000) suggest that decision makers tend to seek 
information that supports their initial hypothesis or decision, and avoid or 
discount information that supports a different decision or hypothesis (what 
they refer to as disconfirmatory evidence). As they write, 


Three possible reasons for this failure to seek disconfirmatory evidence 
may be proposed: (1) People have greater cognitive difficulty dealing 
with negative information than with positive information. (2) To change 
hypotheses—abandon an old one and reformulate a new one—requires a 
higher degree of cognitive effort than does the repeated acquisition of 
information consistent with an old hypothesis. Given a certain “cost of 
thinking” and the tendency of operators, particularly when under stress, 
to avoid troubleshooting strategies that impose a heavy workload on lim- 
ited cognitive resources, operators tend to retain an old hypothesis rather 
than go to the trouble of formulating a new one. (3) In some instances, it 
may be possible for operators to influence the outcome of actions taken on 
the basis of the diagnosis, which will increase their belief that the diag- 
nosis was correct. This is the idea of the “self-fulfilling prophecy.” (p. 313) 


An operator’s reluctance or bias against altering decisions extends to a reluc- 
tance to accurately reassess the situation that led to the initial decision in the 
first place, and to an inability to accurately reevaluate the effects of that deci- 
sion on the situation itself. Consequently, if an initial decision led to adverse 
consequences, the operator may well be reluctant to revisit the decision. 


Errors Involving Naturalistic Decision Making 


Klein (1999) argues that the concept of decision errors in real world settings 
may itself have little validity because of the often-untidy nature of those 
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settings. Orasanu, Martin, and Davision (2001), cite additional difficulties 
with the concept of errors in naturalistic decision making. As they note, 


Defining errors in naturalistic contexts is fraught with difficulties. 
Three stand out. First, errors typically are defined as deviations from 
a criterion of accuracy. However, the “best” decision in a natural work 
environment such as aviation may not be well defined, as it often is in 
highly structured laboratory tasks. Second, a loose coupling of decision 
processes and event outcomes works against using outcomes as reliable 
indicators of decision quality. Redundancies in the system can “save” 
a poor decision from serious consequences. Conversely, even the best 
decision may be overwhelmed by events over which the decision maker 
has no control, resulting in an undesirable outcome. A third problem is 
the danger of hindsight bias...a tendency to define errors by their con- 
sequences. These difficulties suggest that a viable definition of decision 
error must take into account both the nature of the decision process and 
the event outcome. (p. 210) 


Yet, it is clear in reviewing accidents in complex systems that operators 
have made decisions that, even if appearing adequate at the time, are consid- 
ered faulty after the fact. 

This can be illustrated in a 1992 accident involving a Lockheed L-1011 
that crashed after takeoff from John F. Kennedy International Airport in 
New York (National Transportation Safety Board, 1993). The pilots received 
a false aerodynamic stall warning just after the airplane lifted off the run- 
way. They unsuccessfully attempted to return the airplane to the runway. 
By contrast, when other pilots had encountered the same false alert on that 
aircraft, at that point in the flight, they continued flying without incident, as 
called for in the airline’s procedures. However, on this flight the first officer, 
who was the pilot handling the controls and flying the airplane, immedi- 
ately said to the captain, “getting a stall” and then gave airplane control to 
the captain. 

As noted previously, an aerodynamic stall is perhaps the most critical situ- 
ation pilots could face; the airplane develops insufficient lift to remain flying 
and unless the situation is immediately corrected it will almost certainly 
crash. If this airplane was indeed about to stall, as the first officer had told 
the captain, continuing the flight would have meant almost certain catas- 
trophe. Given the airplane’s close proximity to the ground there was little 
or no possibility that the captain could have taken the necessary actions to 
avoid a stall. However, because of the airplane’s high speed and heavy take- 
off weight, with the limited runway distance remaining, attempting to land 
the airplane also meant an almost certain accident. 

The captain’s decision to land clearly led to the accident. However, because 
the first officer had erroneously interpreted the stall warning and told the 
captain that the airplane was about to stall, in the limited time available 
the captain's ability to effectively assess the situation, difficult at best, was 
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almost impossible to achieve. As a result, he gave more weight to the first 
officer’s pronouncement than he would likely have given otherwise. The 
captain's inaccurate situation awareness about an impending stall, albeit an 
awareness largely influenced by the erroneous assessment of another team 
member, led to the decision to attempt to land. 

The Effects of Operators in Similar Circumstances. Rhoda and Pawlak (1999) 
studied commercial flights into one airport, Dallas-Ft. Worth International 
Airport, during thunderstorms. Interestingly, they found that pilots were 
more likely to fly into thunderstorms, weather that creates substantial risk 
to flight safety, when similar aircraft were preceding them than when not. 
Operators flying similar aircraft were considerably influenced by the deci- 
sions and actions of those who, in the time preceding their exposure to 
similar circumstances, made decisions that they may not necessarily have 
made when not following those aircraft. Operators in those situations may 
be thinking that those ahead successfully dealt with the risk and therefore, 
the risk to them is (1) manageable and (2) not as great as possibly thought. 


Decision-Making Quality versus Decision Quality 


The difficulty of examining decision-making errors in systems can be attrib- 
uted, in part, to difficulties of distinguishing between the quality of the 
decision-making process and the quality of the decision itself. The two are 
similar, but the quality of a decision should not be used to gauge the quality 
of the process used to reach that decision. Applying good decision-making 
techniques, such as systematically obtaining, soliciting, and comprehend- 
ing available system information, does not guarantee that decisions will be 
effective; poor decisions can follow good decision making. 

For example, investigators may conclude that an operator properly inter- 
preted the available information, effectively solicited information about the 
system and its operating environment, and still made a decision that later 
proved to be ineffective or worse, led to an accident. In addition, circum- 
stances in complex systems may be so dynamic that the critical information 
changed after an operator initially obtained situation awareness. Although 
a systematic process is required for a "good" decision, decisions will only be 
as good as the information upon which they are based, and upon the circum- 
stances being encountered. 





Case Study 


On June 14, 2003, the charter fishing vessel Taki-Iooo, capsized in the Pacific 
Ocean, after its captain crossed the bar at the Tillamook Bay inlet en route to 
the ocean, for an intended day-long fishing trip. The 32-foot vessel had been 
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charted by a group of friends who had traveled several hundred miles spe- 
cifically for the trip, and who had gone together on previous trips with the 
captain. They had specifically asked for this captain to serve as vessel master 
on this trip. Of the 17 passengers and 2 crew onboard, the captain and 10 of 
the passengers were killed in the accident (National Transportation Safety 
Board, 2005). 

The group assembled at the marina early morning, departing on the vessel 
about 06:05 local time. The captain transited the inlet to the bar, which sepa- 
rated the inlet from the Pacific Ocean, and waited there for favorable seas; 
the sea state at the bar was particularly rough that morning. Investigators 
attempted to answer the question, why did the captain attempt to cross the 
bar knowing that the risk of an erroneous decision could be catastrophic to 
him and his passengers? 

Although recorded communications and electronic data were not avail- 
able, other data allowed investigators to reliably reconstruct the scenario. 
These data included reports from Taki-Tooo survivors and from those on the 
vessels that had crossed the bar before the Taki-Iooo captain attempted to do 
so, weather data, computer records of the captain’s accessing weather data, 
Coast Guard observations, the captain's previous experience in the area, and 
in particular, reports of vessel captains who crossed the bar that morning, 
their vessel sizes, and the timing of their crossings. Together investigators 
were able to not only reconstruct the accident circumstances but to describe 
the captain’s decision making as well. 

Investigators determined that the captain was aware of the sea conditions 
that morning. The night before he had accessed weather information about 
the bar from his home computer. On the morning drives from his home to 
the marina, when serving as captain, he was reported to have listened to 
local marine weather broadcasts. Further, the U.S. Coast Guard, which main- 
tained an observation base there, warned mariners when sea conditions at 
that bar were particularly challenging and they had so warned mariners that 
morning. The U.S. National Weather Service had forecast and then issued 
small craft advisories for the area, advisories that were disseminated when 
winds and sea conditions were between 20 and 33 knots and seas greater 
than 7 feet. 

After leaving the marina he arrived at the bar about 30 minutes later, 
where he waited until 07:15. Three other vessels waited with the Taki-Tooo, 
each of which successfully crossed the bar and reached the ocean. At 07:15 
this captain attempted to do the same, but the vessel was struck by a wave 
and capsized. 


Information Available 


Investigators focused on the background of the captain, and the circum- 
stances under which he made the decision to cross, to understand the nature 
of his decision-making error in deciding to cross. The captain was 66-years 
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old and had over 26 years of experience operating charter fishing vessels, 
primarily in the Tillamook Bay area. He and his wife had owned the com- 
pany that owned and operated the Taki-Tooo and another vessel, but had sold 
it about 2 years before the accident. The captain agreed to serve thereafter 
on a part-time basis as captain of the Taki-Tooo upon the request of the group 
chartering the vessel. He was to be paid upon completion of the charter, but 
investigators determined that he did not make the decision to cross for finan- 
cial need. He and his wife were said to have sold the company as part of their 
retirement planning. 

Thus, the captain, a mariner with years of experience operating charter 
fishing vessels in this area, was skilled, understood weather information, 
and was, because of his experience, cognizant of the hazards the sea state 
presented. In situation awareness terms, he had Levels 1 and 2 situation 
awareness. He had all of the information necessary regarding the sea state, 
Level 1, and he understood the meaning of the weather information in terms 
of the risks it posed to his vessel, Level 2. 

But investigators determined that several factors influenced his deci- 
sion. First, although his passengers would have understood had the cap- 
tain decided not to cross for safety reasons, he knew the passengers and 
was aware that they had asked for him to serve as captain. This knowledge 
would have made him less willing to disappoint them. As investigators 
write: 


..the knowledge that the passenger group chartering his vessel had 
specifically requested that he serve as their master would most likely 
have subtly affected his decision not only to leave port but also to subse- 
quently cross the bar. He would have been motivated not to disappoint 
those passengers who had traveled some distance to engage in a fishing 
expedition under his command. (National Transportation Safety Board, 
2005, pp. 43-44) 


Further, by traveling to the bar and waiting, with the other vessels, for a 
sea state that would have allowed a safe crossing, he, with the other vessel 
captains, also limited his options with the passengers. Investigators write, 
with regard to the decisions of those vessel masters: 


While the decision to leave the dock to assess conditions at the bar might 
have been prudent, it also probably subtly influenced the masters’ subse- 
quent decisions to cross the bar rather than return to the dock. By load- 
ing passengers on the vessel and taking them almost as far as the bar, the 
masters’ decision-making ability to return to the dock without crossing 
the bar was diminished. To return to the dock would have meant that 
each master would have had to personally face and explain his deci- 
sion to the passengers who had prepared for the expedition and boarded 
the vessel and whose anticipation for the fishing voyage no doubt had 
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increased as they neared the bar. (National Transportation Safety Board, 
2005, p. 43) 


In addition, the years of experience of the captain operating charter fishing 
vessels in that area may have worked against the quality of his decision mak- 
ing. His previous efforts crossing the bar had all been successful. As a result, 
even given his awareness of the risks of crossing, experiencing only success- 
ful crossings may have influenced his perception of the risk. As Orasanu and 
Martin (1998, p. 103) observed with regard to aviation, 


If somewhat similar risky situations have been encountered in the past 
and the crew has successfully taken a particular course of action, they 
will expect also to succeed this time with the same CoA [course of 
action], for example, landing at airports where conditions frequently are 
bad, for example in Alaska. Given the uncertainty of outcomes, in many 
cases they will be correct, but not always. 


As noted, after the Taki-Tooo arrived at the bar and waited with three other 
vessels, the captains of those vessels crossed successfully. Vessel size and 
engine power affect vessel stability. The larger the vessel and the more pow- 
erful its engine, the more readily it can withstand rough seas. The first two 
vessels that crossed were larger than the Taki-Tooo and had more powerful 
engines. The one that crossed immediately before the captain attempted to 
do so was about the same size, and with the same approximate sized engine 
as the Taki-Tooo. Because the captain and his wife had owned that vessel 
through the company that they had sold, and having operated it before, the 
captain was familiar with its stability and handling characteristics. Once 
that vessel crossed the captain had evidence that a comparably sized vessel 
could successfully cross the bar. 

Nonetheless, despite the successful outcomes of the vessels that crossed 
before the Taki-Tooo captain’s attempt, those crossings were not uneventful. 
Some onboard one of the vessels reported having sustained injuries dur- 
ing the crossing. Most important, however, to the captain’s decision was the 
nature of the dynamic conditions the sea presented and the length of the 
interval between deciding to cross and actually crossing the bar. Because 
the crossing could not be instantaneous, any decision to cross could not 
have been based on the sea state encountered during the crossing, given the 
dynamic sea conditions and the time needed to reach the bar from the point 
at which the decision to cross had been made. In terms of situation aware- 
ness, no captain at the bar could have had Level 3 situation awareness. The 
time interval between the decision and the encounter, and the severity of 
the sea state meant that situation awareness would be inaccurate, and any 
decision based on inaccurate situation awareness has a likelihood of being 
erroneous. Investigators conclude: 
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... the decision of the Taki-Tooo master to cross the bar was probably influ- 
enced by a host of factors, including the request of the passengers for his 
services, his observations of sea conditions comparable to those he had 
seen before, his previous experience making the bar transit with this 
vessel, and his observation of the crossings of the other vessels before 
him. [However], no master can be assured that conditions encountered 
when crossing will be the same conditions as those observed when the 
decision to cross is made. The tragic consequences of his attempt to tran- 
sit the bar demonstrate the faultiness of his decision-making. (National 
Transportation Safety Board, 2005, p. 46) 





Summary 


Operators conduct a situation assessment to understand the system state and 
its operating environment. Situation awareness is the understanding opera- 
tors have of the system and its environment at any one time. It is based on an 
operator's obtaining critical system-related information, understanding the 
information and its description of the system state, and being able to proj- 
ect from the current system state to the near term. Equipment factors, such 
as display interpretability; operator factors, such as experience, knowledge, 
and skills; and company factors, such as training, affect situation awareness 
quality. The quality of situation awareness directly affects the quality of sub- 
sequent decision making. 

In general, operators use one of two types of decision-making processes. 
One, classical decision making, is applied primarily to relatively static situ- 
ations and the other, naturalistic decision making, is applied primarily to 
dynamic situations. In classical decision making the decision maker gener- 
ates options based on the nature of the situation, evaluates the costs and 
benefits of the options, and selects the one with the greatest benefits for the 
least cost. In naturalistic decision making, the decision maker quickly makes 
a decision by first recognizing the situation and then selecting an option that 
seems to work for that situation, even if it is not necessarily the “best” option 
that could follow a more thorough analysis. Decision-making biases influ- 
ence the quality of decisions made through either process. 


DOCUMENTING SITUATION AWARENESS 
AND DECISION MAKING 


SITUATION AWARENESS 


e Identify the information that the operator used or, if the opera- 
tor is unavailable, was likely to have used to obtain situation 
awareness. 


Situation Awareness and Decision Making 


Document equipment, operator, and company antecedents that 
could have affected the operator's understanding of the event. 
Document the system state from recorded data, operating man- 
uals, personnel interviews, and other relevant data sources. 
Observe system operations, if possible, to determine the infor- 
mation upon which the operators relied. 

Interview operators, both critical and noncritical to the event, 
to learn of the techniques they use to understand the state of 
the system and its operating environment. 

Document the sources of information available to the operator, 
using the methods described in the preceding chapters. 
Identify the operator's previous encounters with similar sce- 
narios in similar systems. 

Document the time the operator first perceived the critical situ- 
ation and the time he or she responded. 

Compare the operator's perceptions of the events with the 
actual system state. 


DECISION MAKING 


Document the tasks that the operator performed, and the 
amount of time available to complete the tasks. 

Determine the information available, and the information that 
the operator used. 

Identify the operator's decisions and actions. 

Evaluate the effectiveness of the decisions in terms of their con- 
sequences as well as the decision-making process used. 
Examine training, programs and procedures to identify defi- 
ciencies that could have led to adverse outcomes. 

Examine the circumstances the operator encountered and the 
extent to which they changed between the time a decision was 
made and the time it was implemented. 
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Automation 











“David, I'm afraid.” 
Hal the computer to astronaut David Poole as he was removing Hal's 
higher cognitive powers, in Stanley Kubrick’s film, 2001: A Space Odyssey* 


_ >>> 


Introduction 


Although much of the technological innovations that Stanley Kubrick 
had envisioned in his landmark 1968 film, 2001: A Space Odyssey, have not 
been realized, dramatic changes have nonetheless taken place in complex 
systems since then. Unlike Hal, the omniscient and omnipotent computer, 
automation today does not exercise absolute control over complex systems 
and the people who operate them. Operators still control complex systems, 
although their role has changed as automation has increased and grown 
more sophisticated. 

As systems have become more advanced, automation has performed an 
ever-larger share of both the manual and the cognitive tasks that operators 
had previously done themselves. The increased role of automation in sys- 
tems has enhanced many aspects of system operations, but it has also led to 
unique antecedents to errors, errors that have led to incidents and accidents. 

Two aircraft incidents, involving what today would be considered rela- 
tively simple automation, illustrate the type of operator errors that could 
result from the application of automation to complex systems. In each, the 
aircraft sustained substantial damage but the pilots were able to land safely. 
In 1979, a DC-10 experienced an aerodynamic stall and lost over 10,000 feet of 
altitude before the pilots recovered airplane control. They had inadvertently 
commanded a control mode through the airplane's autopilot that called for a 
constant speed climb, but during the climb they did not realize that the air- 
speed had decreased below the stall speed (National Transportation Safety 
Board, 1980). 





* The quote, taken from the film, was not in the Arthur C. Clarke novel upon which the film 
was based. 
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As discussed previously in this text, an aircraft experiencing an aerody- 
namic stall develops insufficient lift to maintain flight. Unless pilots respond 
quickly the aircraft will almost certainly lose altitude rapidly and crash. In 
this accident the pilots had engaged an automated flight mode that main- 
tained the pilot selected climb rate. However, with no other changes to air- 
craft control, a climbing aircraft can only maintain a constant climb rate 
at the expense of its forward airspeed. At some point, the airspeed will be 
insufficient to develop lift and the airplane will stall. 

The pilots had delegated airplane control to the autopilot, but did not effec- 
tively monitor the aircraft’s performance thereafter. Instead, they relied on 
it to perform accurately and reliably, but did not notice that the airplane’s 
airspeed had decreased below the minimum required to maintain forward 
air speed and lift. Despite three sources of data presented visually, aurally, 
and tactilely informing them of their insufficient airspeed and the stall, none 
recognized that the airplane had experienced a stall. 

Several years later a Boeing 747 lost power on an outboard engine while in 
cruise flight (National Transportation Safety Board, 1986). Should an engine 
on an aircraft with four engines fail, the two engines on the other wing 
would generate about twice the thrust as the remaining engine alone could 
generate. Without corrective action, the differential thrust would cause the 
airplane to swing or “yaw” to the side of the failed engine. 

An engine failure is not a catastrophic event, so long as the pilots perform 
maneuvers that all pilots are trained to perform to maintain aircraft control. 
However, these pilots did not respond to the engine failure and did not per- 
form the necessary actions to correct the yaw. The autopilot continued to 
counteract the yaw in order to maintain the selected flight path. However, 
after several minutes, the autopilot could no longer maintain the heading 
and it automatically disengaged from airplane control. The airplane entered 
a steep dive and lost over 30,000 feet before it was recovered. The pilots nei- 
ther recognized nor responded effectively to the situation until the airplane 
had reached the end of the dive. 

In both instances, what today would be considered relatively primitive 
types of aircraft automation performed precisely as designed. The DC-10 
pilots failed to monitor the actions of the automation mode that they had 
selected, and the Boeing 747 pilots did not disengage the automation and 
take manual control of the airplane when necessary. Neither of the two pilot 
teams seemed to recognize that the automation, which had performed so 
reliably in the past, could also lead to catastrophe if not monitored. 

Since then the application of automation to complex systems has increased, 
and more sophisticated types of automation applied, but operators have 
continued to make errors interacting with the automation. For example, a 
highly automated aircraft that was introduced in the late 1980s and operated 
by different airlines, each with its own operating procedures, was initially 
involved in a number of fatal accidents due, at least in part, to operator errors 
in dealing with the automation. The relatively high number of accidents of 
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this airplane type illustrates a phenomenon that seems to occur after tech- 
nologically advanced systems have begun service, a lengthy initial period 
in which managers and regulators come to recognize how the new technol- 
ogy requires changes in the way operators deal with the system. Training 
programs and procedures are then modified in response to the effects of the 
technological advances. As Amalberti writes (1998), 


Any new technology calls for a period of adaptation to eliminate resid- 
ual problems and to allow users to adapt to it. The major reason for this 
long adaptive process is the need for harmonization between the new 
design on one hand and the policies, procedures, and moreover the men- 
talities of the...system on the other hand. (p. 173) 


To better understand how automation can affect operator performance, the 
nature of automation itself will be examined, and effects of automation on 
system operations reviewed. 





Automation 


Automation can mean many things to many people. Billings (1997) defines 
automation as the replacement of tasks that humans had previously per- 
formed by machines. Moray, Inagaki, and Itoh (2000) define automation as 
“any sensing, detection, information-processing, decision making, or con- 
trol action that could be performed by humans but is actually performed by 
machine” (p. 44). 

Automated systems can perform a wide range of both manual and cogni- 
tive tasks, ranging from minimal to complete system control. Indeed, when 
planning complex systems, designers decide on the level of automation 
to bring to the system. Parasuraman (2000) and Parasuraman et al. (2000) 
describe up to 10 levels of automated control that designers can incorporate 
into a system, from a fully manual system to complete automated system 
control. In the lowest level, the operator performs all tasks, and in the highest 
level the automation makes all decisions and takes all actions independent 
of, and without communicating with, the operator. The levels in between 
range from automated sensing and detection, to offering operators decision 
alternatives, to deciding for the operator, to the highest level, complete con- 
trol. They recommend that designers consider the level of automation that 
is optimum for the operator tasks they wish to automate, and the effects of 
the automation level on the operator’s ability to perform the requisite tasks. 

Others argue that the severity and immediacy of error consequences 
should dictate the level of automation implemented. Moray et al. (2000) 
believe that the optimal level of automation depends on such elements as the 
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complexity of the system, the risk of a fault, and the dynamics of an event. 
They suggest that an immediate response to a system fault is needed when 
an automated response will be superior to a human one. However, to avoid 
unnecessary and quite costly system shutdowns in situations that are not 
time critical, they suggest that operators, not the automation, retain ultimate 
control of the system. 

Today the variety of automated systems employed and their levels of 
automated control are considerable. Automation can function as a single 
subsystem or as a constellation of subsystems operating interdependently. 
Nevertheless, despite the range of possible automation functions available, 
automated systems currently perform four functions: acquiring information, 
analyzing information, selecting actions based on that analysis, and imple- 
menting the action, as needed (Sheridan and Parasuraman, 2005). The degree 
and level at which the four functions are conducted vary across systems, 
along with their degree of independence from the operators. Today, after 
extensive research into automation’s effects on operator performance has 
been carried out and several automation-related events have occurred, many 
recognize that automation has led to many positive and negative effects on 
operator performance. 





Automation Advantages and Disadvantages 
Benefits 


There is little question that automation has enhanced many aspects of com- 
plex system operations. Wiener and Curry (1980) and Wiener (1989) examin- 
ing the effects of automation in the aviation environment, believe that these 
resulted from a combination of technological, economic, and safety factors, 
not all of which have been realized. For example, automation can reduce 
operator workload, enabling operators to attend to “higher level” activi- 
ties, such as system monitoring and troubleshooting. Automation can also 
raise operator productivity decreasing the number of operators needed and 
lowering operating costs. Automated systems are also highly reliable and 
can control system performance with considerable accuracy. In aviation and 
marine settings, for example, automated navigation systems can maintain 
operator-selected courses or tracks with little or no deviation, and with little 
or no operator involvement necessary to ensure the accuracy of the main- 
tained course. 

Modern systems also offer flexibility to the design of both displays and 
controls. This enhances safety by enabling designers to display potentially 
readily interpretable and accurate system-related information, benefitting 
situation awareness and increasing operators’ abilities to recognize and 
respond effectively to system anomalies. Digital displays can integrate and 
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present data with fewer gauges, and in a more interpretable manner than 
could be done with analog gauges, and controls can be designed to better 
match the needs of operators than in older, nonautomated systems. 


Shortcomings 


Automation has also brought about negative consequences that, on occasion, 
have adversely affected operator performance and increased opportunities 
for error. In some ways, the benefits of automation, such as its high reliability 
and accurate system control can be seen to actually work against operator 
performance. As Jamieson and Vicente (2005) note: 


The use of automation in complex sociotechnical systems has proved 
to be a double-edged sword. It is a technology that, perhaps more so 
than any other, speaks with a forked tongue to system designers. On 
the one hand, it promises unprecedented reliability, reduced workload, 
improved economy, and fewer errors. On the other hand, it whispers of 
less tangible, but no less real, costs to operators in terms of skill degrada- 
tion, mental isolation, and monitoring burdens...(p. 12) 


Researchers have identified several effects of automation that can poten- 
tially work against operator performance, and thus create opportunities for 
errors. 

User Interface. Some automation applications have reduced the types and 
amount of data that operators had depended upon for system performance 
feedback, thereby reducing operator's awareness of the system state (e.g. 
Norman, 1990; Billings, 1997). Feedback reduction can be seen in several 
highly automated aircraft types. Two interconnected pilot control columns 
that pilots had used to control the flight path also enabled each to observe 
the other’s control column movements through corresponding movement in 
their own controls. These have been replaced by control sticks with no cor- 
responding movement. Moving one does not move the other. Pilots using 
these controls cannot rely on tactile and visual feedback from control col- 
umn movement to recognize the other pilot’s inputs as they could on older 
models. Rather, they must focus on flight displays and interpret the data to 
recognize the results of changes to aircraft controls. 

Similarly, in older aircraft pilots move throttles or control levers for- 
ward or back to increase or decrease engine thrust. On aircraft equipped 
with autothrottles, engine thrust is automatically maintained, varying in 
response to pilot selected performance parameters, environmental con- 
ditions, design limitations, and operating phase. On most autothrottle- 
equipped aircraft, pilots have two sources of information to inform them 
of autothrottle commanded changes in engine thrust: forward and aft 
throttle movement and engine-related data displays. However, on some 
advanced aircraft stationary throttles have replaced moving throttles, thus 
eliminating visual and tactile cues of throttle movement that pilots had 
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relied upon to detect engine thrust changes. This has reduced the available 
sources of information on engine thrust changes to one source, visually 
presented information from engine displays. Worse, with changes to both 
controls pilots have been forced to rely on their foveal or central vision to 
learn of changes in engine operation, rather than their peripheral vision. 
Because peripheral vision is more sensitive to changes in movement than 
central vision, control changes have become even less detectable than they 
had been before. 

Automation has also changed the design of system controls by making 
extensive use of keyboards, touchscreens, trackballs, and other control 
types, rather than the larger and more defined levers, pulleys, and manual 
controls found on older systems. In routine situations in which operator 
workload is predictable, controlling the system through keyboard or touch- 
screen will likely not affect operator performance. However, in nonroutine 
situations, when operator workload is likely to be high, interacting with the 
automated controls can be cognitively demanding, reducing operator aware- 
ness of peripheral visual cues and increasing workload further in inoppor- 
tune circumstances. 

For example, in automated systems activities can be programmed 
through a small keyboard, potentially reducing workload because fur- 
ther control inputs would not necessarily be needed—so long as the pro- 
grammed activities do not change. However, should they need to change, 
operators will have to execute numerous keystrokes again, during periods 
of high workload. Attending to the keyboard and inputting keystrokes 
increase workload substantially over the steps that would be required in 
less-automated systems and during unexpected or nonroutine situations, 
and the increase on operator workload can have adverse effects on opera- 
tional safety. 

System Transparency. Few operators are aware of the design or logic of 
the software and the contents of the algorithms and databases that guide 
the automation of the systems they operate. As many software users do, 
rather than understanding a program’s underlying design, operators strive 
to become sufficiently familiar with its application, either through formal 
training, experience, or both, to operate it as needed. In most circumstances, 
the lack of automation logic transparency, what Woods, Johannesen, Cook, 
and Sarter (1994), describe as the “opaque” nature of automation, will not 
adversely affect operator performance. However, in the event of a system 
anomaly, operators’ unawareness of the reasons for the actions of the auto- 
mation, or the inability to predict its next actions, degrades their ability to 
diagnose and respond. As Billings (1997) notes, “regardless of the cause, the 
net effect [of this] is diminished awareness of the situation, a serious problem 
in a dynamic environment” (p. 188). 

Further, automation opacity makes operators reluctant to intervene should 
they become uncertain of the automation outcomes to expect. Sarter and 
Woods (2000) found that when airline pilots were faced with unexpected 
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automation actions, they hesitated to become more involved in the system's 
operation. Instead, most persisted in attempting to understand the nature 
of the error, even to the point of not monitoring the airplane’s operation and 
allowing it to enter potentially dangerous conditions. 

In some systems the automation is sufficiently independent that it can 
engage one of multiple operating modes without operator input or guidance. 
Each operating mode offers capabilities specific to the needs of the various 
operating phases, but the system may not effectively inform operators of the 
identity of the mode that is engaged. Several researchers have found that 
operators were often unaware of the system’s operating mode, a potentially 
critical element of situation awareness in any complex system (e.g., Sarter 
and Woods, 1995, 1997, 2000; Degani, Shafto, and Kirlik, 1999). 

Sarter and Woods (1997) characterize mode changes and related phenom- 
ena that operators do not expect as “automation surprises,” which, “begin 
with misassessments and miscommunications between the automation and 
the operator(s), which lead to a gap between the operator’s understanding 
of what the automated systems are set up to do and how the automated sys- 
tems are or will be handling the underlying process(es)” (p. 554). They sug- 
gest that automation surprises are based on poor operator mental models 
of automation, as well as low system observability during highly dynamic 
or nonroutine situations. By itself the loss of mode awareness, that is, situa- 
tion awareness with regard to system operating mode, can create opportu- 
nities for error. However, in combination with automation opacity, loss of 
mode awareness can considerably reduce operator situation awareness and 
enhance opportunities for error. 

The effects of several of these automation effects are evident in the 
December 1995 accident involving a Boeing 757 that crashed near Cali, 
Colombia (Aeronautica Civil of the Government of Colombia, 1996). The 
crew was using the airplane’s automated flight management system to con- 
trol the flight. The captain misinterpreted an air traffic controller’s clearance 
to Cali and reprogrammed the aircraft automation to fly directly to the Cali 
radio navigation beacon rather than to waypoints located short of the field, 
as the approach procedure had required. When told to report passing over 
a waypoint in between, both pilots were unaware that the captain had inad- 
vertently deleted the critical waypoint and all intermediate waypoints from 
the automated flight path control by establishing the direct course to the Cali 
beacon. 

After repeated, unsuccessful attempts to locate the critical waypoint 
through the automation, they decided to fly to a waypoint just short of the 
field, again by reprogramming the automation to fly the new flight path, 
during an already period of high workload. However, they were unfamiliar 
with the designation of navigation data stored in the airplane’s navigation 
database, and inadvertently established a course away from Cali. After the 
crew had recognized their error and they turned back to Cali, the airplane 
struck a mountain. 
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Monitoring, Vigilance, and Situation Awareness. Automation has helped 
to distance the operator from many system-related cues. Norman (1981, 
1988) believes that in automated systems operators may no longer directly 
observe the system, hear its sounds, or feel its movement. Instead they 
monitor the data that automated sensors detect and display; which may 
not effectively convey the needed information. This diminished their mode 
awareness and decreased their ability to respond effectively to unexpected 
system states. Further, monitoring displays over extended periods is fatigu- 
ing. Operators lose the ability to accurately detect and respond to system 
anomalies after prolonged periods of monitoring (e.g., Wiener and Curry, 
1980; Molloy and Parasuraman, 1996; Parasuraman, Mouloua, Molloy, and 
Hillburn, 1996). 

Researchers have demonstrated that increasing automation and decreas- 
ing operator involvement in system control reduces operator ability to main- 
tain awareness of the system and its operating states. Endsley and Kaber 
(1999) found that among various levels of automation, people perform best 
when actively involved in system operation. Endsley and Kiris (1995) term 
the reduced operator involvement in system control in highly automated 
systems the “out-of-the-loop performance problem.” They argue that auto- 
mation leads to reduced operator ability to recognize system anomalies as 
a result of (1) reduced vigilance and increased complacency from monitor- 
ing instead of active system control, (2) passive receipt of information rather 
than active information acquisition, and (3) loss or modification of feedback 
concerning system state. 

The investigation of a 1997 accident involving an automated turbo- 
prop aircraft, an Embraer Brasilia, support these conclusions (National 
Transportation Safety Board, 1998). The pilots did not recognize that the 
wings of their aircraft had become contaminated by ice, degrading its aero- 
dynamic characteristics. The autopilot, a sophisticated flight management 
system, attempted to maintain the selected flight path of the increasingly 
unstable aircraft. 

Because the pilots were not directly controlling the airplane they had no 
tactile feedback from the movement of the control column. Only two sources 
of visual information were available to inform them of the airplane’s increas- 
ing loss of lift, an airplane attitude display and the autopilot-induced control 
column movements, which corresponded to what would have been pilot- 
induced control column movements. The pilots did not perceive these cues 
and so did not recognize that the airplane was about to stall. 

The autopilot reached the limit of its control ability and disengaged. The 
airplane quickly went into a turn and then dive, and the pilots were unable 
to regain aircraft control. Had they been controlling the airplane manu- 
ally, the tactile cues of the control column forces would have been far more 
perceptible than were the visual cues of the displays because they would 
have felt the control column forces or seen them through their peripheral 
vision, unlike the visual displays that required direct monitoring. With the 
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movement of the control column the pilots would have recognized that the 
airplane was approaching a stall. With that information they would have 
likely responded in sufficient time to avoid the accident. 

Workload Alteration. Automation has generally reduced operator workload; 
but it has often done so during already low-workload operating phases, 
and it has increased it during already high-workload phases. Woods (1996) 
has described this redistribution of workload as “clumsy automation” (also 
Kantowitz and Campbell, 1996), a phenomenon that increases rather than 
decreases opportunities for operator errors. As Woods (1996) explains, 


A form of poor coordination between the human and machine in the 
control of dynamic processes where the benefits of the new technology 
accrue during workload troughs and the costs or burdens imposed by 
the technology (i.e., additional tasks, new knowledge, forcing the user to 
adopt new cognitive strategies, new communication burdens, new atten- 
tional demands) occur during periods of peak workload, high criticality 
or high tempo operations... (p. 10) 


Yet even simply reducing workload can also degrade operator perfor- 
mance if this occurs during already low workload periods. Excessively 
reduced workload over extended periods can increase boredom and 
increase operator difficulty in maintaining vigilance (O'Hanlon, 1981). As 
noted in Chapter 6, operator alertness decreases over extended periods of 
relative inactivity, increasing the subsequent effort needed to detect system 
anomalies. 

Trust, Bias, and Skill Degradation. Automation can perform so reliably and 
accurately that over time operators’ interactions with the automation change. 
As system automation increases, the number of tasks that are performed 
more accurately and reliably than by operators, grows. This has increased 
operator trust in the automation’s ability to perform those tasks. Yet, as this 
trust grows, their confidence in their own abilities to perform the same tasks 
manually may decrease (e.g., Lee and Moray, 1992). 

Researchers have explored the relationship between automation and 
operator trust. Parasuraman and Riley (1997) found that as operator trust in 
the automation grows, they increasingly delegate responsibility for system 
monitoring to the automation. At the same time, their vigilance and ability 
to recognize system faults may decrease because their expectation of, and 
preparedness for, system faults decreases with their growing trust in the 
automation. 

Moray, Inagaki, and Itoh (2000) note that operator trust in a system depends 
primarily on its reliability. They suggest that with less than about 90% reli- 
ability trust in the automation falls off considerably. By contrast, operators’ 
self-confidence in their own ability to operate the system depends not on the 
system but on their experiences with the system. Paradoxically, the high reli- 
ability and accuracy of automation make it more, rather than less, likely that 
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operators will fail to effectively monitor automated systems as they come to 
rely on them more and more, and on themselves less and less. 

Mosier and Skitka (1996) suggest that the high degree of automation reli- 
ability and accuracy can, over time, lead operators to put more faith in auto- 
mated system guidance than in their own experience and training, and 
reduce their vigilance. These will lead operators to overlook problems that 
the automation fails to detect, or to unquestioningly follow the guidance 
that automation offers, even when the guidance is inappropriate. Mosier 
and Skitka (1996) refer to this excessive trust and confidence in automation 
as “automation bias.” Operators can over-rely on the automation in highly 
automated systems, much as team members can over-rely on other operators 
in their team (see also Mosier, Skitka, Heers, and Burdick, 1998; Skitka and 
Mosier, 2000). 

Bainbridge (1983) and Billings (1997) point out that reliance on automation 
for monitoring and decision making can erode operator skills, increasing the 
likelihood of error in the event of a system fault. Bainbridge terms this an 
"irony of automation" because, 


When manual takeover [of a system] is needed there is likely to be some- 
thing wrong with the process, so that unusual actions will be needed to 
control it, and one can argue that the operator needs to be more rather 
than less skilled, and less rather than more [task] loaded, than average. 
(p. 272) 


An accident that occurred in Columbus, Ohio, in 1994, in which a Jetstream 
J-41, an automated turboprop airplane, crashed just short of the runway, 
illustrates how insufficient operator self-confidence and excessive trust in 
automation can lead to critical errors (National Transportation Safety Board, 
1994). At the time of the accident, the weather was poor and visibility limited, 
conditions that are often quite demanding, thus increasing pilot workload. 
Each pilot had reason to lack confidence in his own operating skills. The 
first officer, with little experience operating highly automated aircraft, had 
only recently been hired. The captain, though experienced in the aircraft, 
had demonstrated deficiencies in several failed check or examination flights. 

The captain had programmed the airplane's flight management computer 
(FMC) and engaged it to fly the precise flight path to an approach and land- 
ing. However, although the FMC could accurately fly a preprogrammed 
three-dimensional flight path, it controls only the flight path, unlike automa- 
tion of larger air transport aircraft. On this airplane, the pilots and not the 
automation control the aircraft's airspeed. 

The captain had delegated flight path control to the automation, but then 
failed to effectively monitor the airspeed. The airplane flew precisely along 
the flight path, until its airspeed decayed and it experienced an aerodynamic 
stall. The pilots were unable to recover the airplane. The captain's history 
of piloting deficiencies contributed to his reliance on the automation. With 
apparently greater confidence in the airplane's automation than in his own 


Automation 269 


abilities, he delegated flight path control to the automated flight management 
system. Although this was not in itself an error, he then failed to adequately 
monitor the airplane's airspeed, apparently focusing primarily on its flight 
path, and this failure led to the accident. 

The captain's actions on this flight are consistent with Riley's (1996) obser- 
vations that the reliability of automation itself influences an operator's deci- 
sion on task assignment. As he observed, 


If the operator had more confidence in his or her own ability to do that 
task than trust in the automation, the operator was likely to do the task 
manually, whereas if the operator's trust in the automation was higher 
than the operator's self-confidence, the operator was likely to rely on the 
automation. (p. 20) 


Team Performance. Researchers have suggested that automation can be con- 
sidered to be a member of an operator team, altering the role of the team 
members. Scerbo (1996) argues that an automated subsystem can coordinate 
activities, be guided by a coach, perform functions without causing harm, 
provide necessary information when needed, and otherwise perform the 
types of tasks that human operators typically perform. Paris, Salas, and 
Cannon-Bowers (2000) contend that automation can replace all or part of 
team functions, leading to restructured teams and redefined team member 
roles. As Woods (1996) notes, “introducing automated and intelligent agents 
into a larger system in effect changes the team composition. It changes how 
human supervisors coordinate their activities with those of the machine 
agents” (p. 4). 





Automation-Related Errors 


New technology can engender changes in complex systems and operator 
interaction with those systems that, using Reason's (1990, 1997) terms, lead to 
latent errors or latent conditions that, in turn, create antecedents to operator 
errors. Although regulators, operators and companies have learned to adapt 
to automation effects, certain commonalities in automation-related errors 
have emerged. Operators committing automation-related errors often fail to 
effectively monitor the systems, or to understand the effects of their actions 
or those of the automation. 

These types of errors were seen in several previously described acci- 
dents. In the 1972 accident discussed in Chapter 9, for example, involving 
the Lockheed L-1011 that crashed in the Everglades, the pilots had engaged 
the automation to maintain a flight path at a prescribed altitude (National 
Transportation Safety Board, 1973). Several minutes later they inadvertently 
disengaged the automation’s altitude hold feature and the airplane began to 


270 Investigating Human Error 


descend, but they were unaware of this. After delegating flight path control 
to the automation, they attended to a system anomaly but did not monitor 
the flight path. They did not realize that the automation had ceased main- 
taining the selected altitude. 

Similarly, in a 1992 accident involving an Airbus A-320 that crashed while 
on approach to Strasbourg, France (Commission of Investigation, 1994), the 
pilots established a 3300-foot per minute descent rate, several times faster 
than a standard descent rate. However, investigators believed that they 
had actually intended to establish a 3.3” descent angle; a flight path angle 
that would have corresponded to the actual descent path called for in the 
approach, unlike the one the aircraft actually flew. As with the L-1011 acci- 
dent, after programming the flight path the crew failed to monitor a criti- 
cal aspect of the aircraft's performance, the increasingly rapid descent rate. 
Although they had established the descent rate through their actions, they 
did not recognize it and did not attempt to reduce it before the accident. 





Case Study 


On June 10, 1995, the cruise ship Royal Majesty, en route from Bermuda to 
Boston, Massachusetts, grounded off the United States coast, causing over 
$7 million in damages to the vessel (National Transportation Safety Board, 
1997). 


The Navigation System 


Early in the voyage a crewmember checked the ship's navigational equip- 
ment, a system that included a GPS (global positioning satellite) antenna and 
receiver, to assure that the ship was following the correct course. The GPS 
system receives signals from a series of satellites and uses them to derive 
highly accurate position information. The vessel was also equipped with 
an integrated bridge system that combined GPS data with other naviga- 
tion information to steer the vessel along a preprogrammed course, while 
compensating for wind, current, and sea state. The integrated bridge system 
displayed the ship's derived position on a video screen in grid coordinates. 
The operators were confident that the displayed, GPS-derived position was 
accurate. 

The system was designed to automatically default to dead reckoning navi- 
gation, a method that did not compensate for wind, current, and sea state, 
in the event that GPS satellite signals become unavailable. Because it lacks 
the accuracy of the GPS, dead reckoning requires regular crew attention to 
ensure that the ship maintains the desired course, unlike GPS-based naviga- 
tion. In the event that the system defaulted to dead reckoning, the integrated 
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bridge system would emit a series of aural chirps for one second to alert the 
crew that it had reverted to the default navigation mode. It would also dis- 
play on the video screen “DR” for dead reckoning, and “SOL” for solution. 
The font sizes used for DR and SOL were considerably smaller than those 
used for the position coordinates. 

The crewmembers who used the integrated bridge system, the master, the 
chief officer, the second mate, and the navigator, had not used this type of 
system before their assignment to the Royal Majesty, and the cruise line had 
not formally trained them in its use. The ship's officers learned to operate the 
system by reading, the relevant manuals and receiving on-the-job training 
from an officer experienced in the system. 


The Accident 


After the ship departed Bermuda, its cable that connected the GPS antenna 
to the receiver separated. As a result, the integrated bridge system could 
not receive GPS signals and it defaulted to dead reckoning navigation, as 
designed. It then continued to navigate and steer the vessel in this mode, but 
its course began to deviate from the intended one, until the vessel grounded 
17 miles off course. 

Investigators identified several errors that the watch officers, responsible 
for monitoring; the vessel and its course, had committed. The officers did not 
understand the “DR” and “SOL” messages that the system displayed, and 
had not attempted to learn their meaning. Therefore, they did not recognize 
that the system had ceased to receive GPS data for navigation and course 
control and they were unaware that it had defaulted to the less accurate dead 
reckoning method. Although they had regularly checked the bridge system's 
display to confirm that the vessel was following the programmed course, 
they did not verify that the course that was displayed corresponded to the 
programmed one. 

If the system had been navigating by GPS, the programmed and the actual 
course would have matched. However, because of the limitations of dead 
reckoning, without crew intervention the courses were likely to diverge 
when the system used that navigation method, and the vessel increasingly 
deviated from its intended course. 

Investigators determined that several aspects of the crew’s use of the 
automation led to their errors, and that the automation had fundamentally 
affected the crewmember roles. As they conclude (National Transportation 
Safety Board, 1997), 


Bridge automation has also changed the role of the watch officer on 
the ship. The watch officer, who previously was active in obtaining 
information about the environment and used this information for 
controlling the ship, is now “out of the control loop.” The watch offi- 
cer is relegated to passively monitoring the status and performance 
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of the automated systems. As a result...the crewmembers of the Royal 
Majesty missed numerous opportunities to recognize that the GPS 
was transmitting in DR mode and that the ship had deviated from its 
intended track. 

[Further,] the watch officers on the Royal Majesty may have believed 
that because the GPS had demonstrated sufficient reliability for 3% 
years, the traditional practice of using at least two independent sources 
of position information was not necessary. 

Notwithstanding the merits of advanced systems for high-technology 
navigation, the Safety Board does not consider the automation of a bridge 
navigation system as the exclusive means of navigating a ship, nor does 
the Board believe that electronic displays should replace visually verifi- 
able navigation aids and landmarks. The human operator must have the 
primary responsibility for the navigation; he must oversee the automation 
and exercise his informed judgment about when to intervene manually. 
(Emphasis added, pp. 34 and 35) 





Summary 


Automation, the replacement of tasks by automated system components 
that operators had previously performed themselves, has both enhanced 
and degraded system safety. Many aspects of automation have affected 
the role of the operator, but some have created unique antecedents to error. 
Automation’s high reliability and accuracy can lead operators to excessively 
rely on it, degrading their vigilance and system monitoring skills. As opera- 
tors repeatedly experience the beneficial aspects of automation, they may 
delegate tasks to it without proper monitoring to ensure that the system per- 
forms as directed. 

Some operators have demonstrated greater trust in the abilities of the auto- 
mation to control the system than in their own abilities. This may lead to 
their unquestioning acceptance of automation guidance, or to overlook prob- 
lems that the automation has failed to detect. 


DOCUMENTING AUTOMATION-RELATED ERRORS 


e Evaluate automated system displays and controls in accor- 
dance with criteria listed in Chapter 4. 

e Describe the specific functions the automated system, its capa- 
bilities in system monitoring and control, and the nature of its 
presentation of system-related data. 
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e Identify the operator's experience with automated systems, 
including the length of time operating the systems, and the 
level or extent of automation the operator typically was famil- 
iar with when operating the system. 


* Document the tasks the automation performs, its information 
sources, the results of its information processing, and the level 
of operator input and control over these tasks. 


e Record operator actions and decisions involving automation, 
and the type of automation-related error(s) committed. 


* Describe the tasks that the operators delegated to the automa- 
tion, and the extent to which the operators monitored critical 
system parameters. 


e Examine the company’s training and procedures in automa- 
tion use, and interview and observe operators, if possible, to 
learn the practices that they employed with regard to automa- 
tion-related interactions. 
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Case Study 














Introduction 


Many errors and their antecedents have been examined in this text, and 
numerous accidents were cited to illustrate the nature of the errors that led 
or contributed to those accidents. One accident in particular demonstrates 
several points about errors in complex systems. This accident also describes 
challenges that investigators could face, highlights the data gathering and 
analysis processes used to identify critical errors and their antecedents, and 
identifies recommendations that investigators can suggest to remediate sys- 
tem deficiencies highlighted in an investigation. 





The Accident 


On July 6, 2013, a Boeing 777-200ER crashed while on approach to San 
Francisco International Airport, destroying the aircraft and injuring 52 of 
those onboard, three of them fatally (National Transportation Safety Board, 
2014). The flight originated in Inchon, South Korea, destined for San Francisco. 
The airplane was later found to have been flying at an airspeed of 110 knots 
when it struck the edge of the runway, about 20 knots slower than it should 
have been at that point. Only seconds before the accident, when it was too 
late to avoid it, the pilots recognized the low speed. Up to that point they had 
not noticed that the autothrottle mode they had selected had changed and it 
was no longer maintaining the selected airspeed. Investigators determined 
that the accident was caused, in part by “the flight crew’s inadequate moni- 
toring of airspeed” (p. 129). 

Airspeed control, with that of altitude and position, is critical to safe flight. 
Only minimal variations from the appropriate speed (based on flight phase, 
airplane weight, and atmospheric conditions) are acceptable at any point in 
the flight, but especially on approach and landing when even small varia- 
tions from the appropriate airspeed can be catastrophic. Airspeeds even a 
few knots too slow can lead to an aircraft stall and/or excessive descent rate. 
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Background 


After investigators read out the airplane’s flight data recorder and deter- 
mined that the airplane’s airspeed was too slow while on approach to the 
runway, they determined that the slow airspeed could have been due to only 
one of two possible causes, an airplane-related failure, either in its engines, 
engine-related systems, or in the autothrottle/autopilot system that main- 
tained the pilot’s selected airspeed, or a pilot error in either failing to moni- 
tor and correct the slow airspeed or to select the proper airspeed for the 
approach. 

At the accident site investigators quickly examined the engines and found 
that the internal engine damage in both engines was consistent with that 
of engines producing power at the time of impact; therefore, they deter- 
mined that the B-777's engines had not failed before the accident (Figures 
16.1 and 16.2). Investigators, reading the flight recorders and other sources 
of data, found that the pilots had entered the appropriate airspeed into 
the autothrottle and the autothrottle had performed as designed. Thus, an 
airplane-related anomaly was ruled out and the focus then centered on the 
pilots’ performance, including their training and understanding of the auto- 
throttle system, company procedures on autothrottle use and monitoring, 
and on the design of the airplane’s automated systems. Given the criticality 
of airspeed to flight safety, the issue of how pilots could lose airspeed aware- 
ness on landing became critical. Pilots are taught, from the very beginning 
of their flight training, to maintain close control of airspeed and to tolerate 
minimum variation in airspeed through all flight phases but especially dur- 
ing takeoff, climb, and approach and landing. 

Moreover, commercial airplane accidents are rare events that receive 
considerable attention. Airlines, regulators, and airframe and engine man- 
ufacturers study each major accident and as investigators’ findings are 
announced, modify their training, procedures, advisories to customers or 
pilots, or oversight, as necessary and as appropriate to minimize the likeli- 
hood that accidents and associated operator errors are repeated. The indus- 
try’s implementation of accident investigation lessons is one factor making 
the worldwide accident rate as low as it has become. 

Yet, before this accident, several previous accidents had occurred in which 
the pilots of autothrottle-equipped aircraft, that is, aircraft with automated 
systems that monitored and controlled airplane airspeed, lost airspeed aware- 
ness on landing. Almost 30 years before this accident a DC-10 touched down 
in New York at an airspeed 30 knots too fast, destroying the airplane and 
leading to 12 passenger and crewmember injuries (National Transportation 
Safety Board, 1984). Further, about 3 years before the San Francisco accident, 
a highly automated airplane was at too low an airspeed on its approach to 
Amsterdam. Nine passengers and crew were killed and the airplane was 
destroyed when the airplane struck the ground about a mile short of the 
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FIGURE 16.1 
The impact point of the B-777 at the seawall, just before runway at San Francisco International 


Airport. Wreckage debris path continued onto the runway. (Courtesy of the National 
Transportation Safety Board.) 





FIGURE 16.2 
Aerial view of the B-777 wreckage at San Francisco International Airport. The airplane came 


to a stop beyond the end of the runway. The damage to the tail suggests that the airplane’s tail 
struck the ground first. (Courtesy of the National Transportation Safety Board.) 
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runway (Dutch Safety Board, 2010). At too high an airspeed on approach 
and landing an air transport airplane will likely to be unable to be brought 
to a stop on the runway; at too low a speed, the airplane will likely stall or 
descend too rapidly into the runway. Despite these accidents involving fail- 
ure to monitor airspeed, and considerable research on the potential hazards 
of operator interaction with highly automated systems, the pilots of the B-777 
allowed the airspeed to deteriorate to the point that the airplane descended 
rapidly and struck a seawall at the edge of the runway (the runway had been 
partially extended into San Francisco Bay). 





The Evidence 


The aircraft’s engines and systems were found to have worked as designed. 
Further, flight data recorder parameters, including a variety of engine per- 
formance, airspeed, and vertical speed data, demonstrated that the airplane’s 
airspeed was too low and its vertical speed, that is, its descent rate, too fast in 
the moments before the accident. Given the absence of a pre-impact flaw with 
the engines and systems, investigators examined the performance of the two 
pilots who served as pilot flying and pilot monitoring during much of the 
flight, particularly on takeoff and on approach and landing. They looked at 
their background, training, company procedures with regard to approach 
and landing airspeed control, and the design of the interface between auto- 
throttle and pilot. 


The Pilots 


The pilot flying, serving as the captain on this flight, was not the pilot in 
command (as will be discussed shortly). He was 45 years old and had begun 
employment with the airline in an “ab initio” program designed to train peo- 
ple with no flight experience to become pilots at the airline. At the time of the 
accident he was rated in the Airbus A-320, the Boeing 737, 747-400, and the 
Boeing 777. He had a total of 9,684 total flight hours, including 3729 hours as 
pilot in command or captain, experience suggestive of an experienced pilot, 
albeit not in the B-777. 

He began his training to serve as a Boeing 777 captain about 6 months 
before the accident, and had successfully qualified as captain, accumulating 
33 hours of flight time and 24 hours of simulator time at the time the acci- 
dent. Before that he had served as captain on the A-320 for just over 5 years. 
Aviation regulations require air transport pilots to complete a period of super- 
vised flight to qualify to fly in their respective positions unsupervised, what 
is referred to as initial operating experience or IOE. In Korea, this require- 
ment called for pilots to fly 20 flight legs for a minimum of 60 flight hours. 
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The captain had completed 8 flight legs and 33 hours 31 minutes of IOE flight 
before the accident flight. All of the landings on those flights involved instru- 
ment landing system approaches, in which the airplane is flown according 
to precise vertical and lateral guidance. On this flight however, the approach 
in use was a visual approach, and the pilots were so informed; equipment 
providing precise vertical runway guidance was out of service. In modern 
aircraft, the autothrottle and autopilot can replicate the guidance needed 
to accurately fly visual approaches by applying internal airplane models of 
both vertical and lateral guidance, provided the pilots enter the necessary 
flight data into the system and closely monitor the approach to ensure that it 
remains within the necessary flight parameters. 

Investigators interviewed three training captains who had observed the 
captain on IOE flights in the Boeing 777. One said that the errors the captain 
made in the IOE were consistent with those of a pilot at that stage of the 
IOE, while another reported the pilot was “above average” in his IOE ride. 
By contrast, the third told investigators that (National Transportation Safety 
Board, 2014), 


The PF [pilot flying] was not well organized or prepared, conducted 
inadequate briefings, poorly monitored the operation, and deviated 
from multiple standard operating procedures (SOP). He said that the PF 
allowed the descent rate to get a little high on short final and allowed the 
nose to drop at an altitude of 200 to 100 ft. This had caused the airplane 
to go below the desired glidepath and forced the PF to initiate the flare 
early. The [instructor pilot or training captain] IP was not overly con- 
cerned, however, because he knew that the PF had to complete more OE 
[IOE] flights. (p. 15) 


The monitoring pilot was 49 years old with a total of 12,307 hours of flight 
experience, 9045 of those as pilot in command, and 3208 hours of those hours 
in command of a Boeing 777. On the accident flight he was acting as first 
officer but serving as the training captain supervising the pilot in his IOE, 
and was the pilot in command of the flight. He began his career as a pilot in 
the Korean Air Force, and had been flying the Boeing 777 for over 5 years, 
all of them as captain. He had qualified as a training captain about 2 months 
before the accident. The accident flight was his first flight in which he served 
unsupervised as a training captain. Pilots who had trained the training cap- 
tain in that role, or who had observed him training in that role, spoke favor- 
ably of his performance and qualifications to serve as a training captain. 

Investigators documented the local times (i.e., in Korea) that the two pilots 
went to sleep and got up in the days preceding the accident flight. They deter- 
mined that the pilots were likely fatigued at the time of the accident because 
of the nature of the flight itself, including the fragmented nature of sleep 
the pilots received in the 24 hours before the flight, given the times slept in 
their rest periods during the flight, and the time of day, corresponding to 
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early morning according to the pilots’ local time, when they would ordinar- 
ily have been deeply asleep. Research has found an association between per- 
formance in the early morning hours and a higher than expected error and 
accident rate (e.g., Lenne, Triggs, and Redman, 1997). 


The Approach to San Francisco 


According to investigators, the critical crew errors that led to the accident 
began less than 5 minutes before impact, when air traffic control directed them 
to slow the airplane to 180 knots from 210 knots, and to maintain that speed 
until 5 miles from the runway. At that point the aircraft was about 14 (nautical) 
miles from the runway. The pilot flying slowed the airplane using its autopilot 
system, entering the 180 knot speed into the autopilot/autothrottle system and 
executing it in the automated system to maintain that airspeed. However, the 
autopilot responded by raising the nose of the airplane, the preferred tech- 
nique to reduce airspeed, but not by reducing the thrust. This reduced the 
airspeed as directed, but its vertical speed was increased as the airplane began 
to climb. Neither pilot noticed the airplane's deviation from the desired verti- 
cal flight path, although investigators pointed out that the information was 
displayed in the airplane's navigation display informing them of the climb. 

Eleven and a half miles from the runway the flying pilot entered a descent 
rate into the autopilot system, in an effort to descend. Shortly thereafter, he 
asked the pilot monitoring, the check pilot serving as the monitoring pilot 
on this flight, to lower the landing gear. Three seconds later, about 3 min- 
utes before the airplane struck the edge of the runway, the monitoring pilot 
told the flying pilot, “this seems a little high,” an English translation of the 
Korean language the pilots used when communicating among themselves 
(communications with U.S. air traffic controllers were in English). The flying 
pilot, after some initial discussion with the monitoring pilot, increased the 
descent rate to 1500 feet a minute, then reduced it back to 1000 feet per min- 
ute at 6.3 miles from the runway. The airspeed was then 178 knots and the 
airplane was several hundred feet above the intended height. 

At this point the pilots began performing another element of instrument 
approach flying, preparing for a missed approach. A missed approach, in 
which the pilots cease conducting the approach and fly away from the air- 
port, is necessary if the controllers cancel a landing clearance, the airplane is 
not properly aligned for a landing, or other reasons. To ensure that the air- 
plane climbed to the appropriate altitude in the event that the crew needed 
to execute a missed approach, the pilot flying began to review the missed 
approach procedure for that runway. He informed the pilot monitoring that 
the missed approach altitude was 3000 feet, an altitude higher than the air- 
plane's altitude at that point. Shortly thereafter, the pilot entered 3000 feet 
into the autopilot as the selected altitude. 

At 5 miles from the runway the flight was about 400 feet above the desired 
altitude but below 3000 feet altitude. The pilot flying changed the autothrottle 
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mode in an effort to increase the descent rate, and also changed the target 
airspeed in the autothrottle mode to 152 knots. He asked the monitoring pilot 
to extend the flaps from 5° to 20°, as required at that point on the approach. 
However, the target airspeed was about 20 knots slower than the airplane’s 
speed at that point, and the selected altitude was higher. As a result, the auto- 
pilot raised the nose of the airplane to climb to the selected 3000 foot altitude 
while the autothrottle increased engine thrust. The flying pilot immediately 
recognized the change in thrust and in pitch and overrode the autothrottle 
by manually moving the thrust levers to idle, and pushed the control column 
down to lower the nose. However, neither pilot was aware that manually 
reducing the thrust and overriding the autothrottle changed the autothrottle 
mode from the one they had selected to one in which it no longer controlled the 
airspeed. The mode change was not signaled by an aural alert; rather, it was 
identified by a change in the identity of the mode displayed in a panel in front 
of the pilots. The pilots, who were not looking at that display were unaware of 
both the autothrottle mode change and its effects on speed control. They did 
not know that that the autothrottle was no longer controlling the airspeed. 

About a minute from impact the airspeed was still higher than desired. The 
flying pilot then changed the target autothrottle speed to 137 knots, the speed 
desired for landing. Informal airline guidance called for pilots, when flying 
visual approaches, to disengage both pilots’ flight directors at that point. On 
the B-777 the flight directors, instruments that provide information on con- 
trolling the control columns, also affect autothrottle mode. Airline guidance 
called for pilots to disengage both flight directors, one for each of the two 
pilots, and then reengage only the first officer's flight director. Disengaging 
both flight directors causes the autothrottle mode to default to the one most 
recently engaged, in this case the one the flying pilot had selected that main- 
tained the airspeed. However, the first officer’s flight director was not dis- 
engaged, only the pilot flying’s flight director. As a result, the autothrottle 
remained in the mode in which it was operating, in this case a mode that no 
longer controlled airspeed. 

The airplane reached the selected airpseed of 137 knots but shortly there- 
after one of the two observer pilots in the cockpit called out “it’s high.” After 
the pilot monitoring called out that the airplane was 1000 feet above the 
ground, the observer pilot called out “sink rate.” The flying pilot responded 
“yes sir.” Six seconds later the observer pilot again called out “sink rate, sir.” 
Ten seconds later air traffic control cleared the aircraft to land and, about 
30 seconds before impact, one of the pilots reiterated the landing clearance 
and announced that the landing checklist had been completed, that is, all the 
landing tasks performed. About 10 seconds later one of the pilots, presum- 
ably the monitoring pilot said, “it’s low” to which the other pilot responded 
“yeah.” About 10 seconds later, 8 seconds before impact, a pilot said “speed,” 
repeating it 2 seconds later. The sound of the stick shaker, a device in the 
control column that provides both tactile and audio cues of an impending 
aerodynamic stall, alerted. Three seconds before impact a pilot ordered “go 
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around,” but at that point the airplane’s descent could not have been arrested. 
It struck the edge of the runway 3 seconds later. 





The Error 


By continuing to fly the approach while attempting to both slow the airspeed 
and have the airplane descend from its height above the desired vertical path, 
the pilots added to what was already the highest workload phase of their flight, 
approach and landing. Because aircraft operate in three dimensions, particu- 
lar preparation is needed when flying an air transport aircraft on approach to 
landing to ensure that it is flying within narrowly acceptable parameters along 
the three dimensions. Not fully establishing the necessary airspeed and flight 
path parameters on approach to landing can challenge the best pilots to bring 
the airplane back to acceptable (or what as referred to as “stabilized”) flight in 
the remaining time available before landing. That is, trying to both slow the 
airplane while attempting to descend it on the vertical profile, and maintain 
the appropriate lateral course to the center of the runway, becomes more diffi- 
cult closer to the runway, when the tolerance for exceeding vertical and lateral 
flight parameters becomes increasingly small. This is especially true when 
precise vertical guidance is unavailable, as the pilots’ had been informed The 
pilots’ failure to stabilize the airplane’s airspeed, vertical speed, and flight 
path when initiating the approach created a situation in which (1) they were 
unable to bring the airspeed and descent rate to acceptable limits and (2) their 
workload increased to the point where they could not devote the attention 
needed to fully monitoring necessary aspects of the flight. At the same time, 
their lack of complete understanding of the automation exacerbated the dif- 
ficulties that they themselves had created in controlling the airplane. Fully 
delegating airspeed control to the automation is acceptable so long as the air- 
speed is monitored and pilots are prepared to quickly retake control if needed. 
However, the research on automation indicates that operators will likely del- 
egate control to automation when their workload is high, as occurred in this 
accident. However, as Bainbridge (1983) might have pointed out, an irony of 
automation is that by fully delegating and not monitoring automation the 
operator removes himself or herself from automation oversight during operat- 
ing phases such as this, when monitoring is most needed. 

The pilots’ major error, their failure to monitor airspeed, concerns a flight 
parameter that, with altitude and vertical speed, pilots must be aware of 
and control, especially during climbs and descents. The antecedents to the 
pilot errors, while compelling, should still not have prevented the pilots 
from monitoring airspeed during approach. Regardless of workload or lack 
of knowledge of the autothrottle system, the fundamental training that all 
pilots undergo to monitor airspeed on approach and landing is not excused 
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by these antecedents. Rather, identifying the antecedents allows investigators 
to explain the errors and understand how they came about. Although other 
errors preceded this one, for investigative purposes, the critical antecedents to 
this single error were the pilots’ self-created workload and lack of understand- 
ing of the autopilot/autothrottle. The latter resulted from a combination of 


* Operator experience 
* Operator fatigue 
* Airplane system complexity and automation opacity 


* Manufacturer information about its automated system 


Automation training 


* Company automation policy 





Antecedents to Error 


In this, as in other accidents, a unique interaction of antecedents led to the 
key pilot error. These were the pilot's incomplete knowledge and misunder- 
standing of the airplane's automation capabilities, a result of manufacturer 
shortcomings in its automation design and in the information it provided 
to airlines operating the 777. These combined with several pilot/airline fac- 
tors including, inexperience in flying visual approaches, and in the pilots' 
respective roles on the airplane, effects from previous experience on an air- 
plane with a similar, but yet critically different stall protection system as the 
B-777, and the airline's policy on automation use. Finally, the pilot's error 
in not disengaging both flight directors contributed to the pilots' error. The 
pilots believed that the autothrottle would maintain airspeed regardless of 
other pilot actions; consequently, they focused their monitoring on flight 
path and did not monitor airspeed. 

Investigators noted that the pilots were required, no later than when reach- 
ing 5 miles from the runway, to discontinue an approach if the airplane 
was not within acceptable flight path and speed parameters. The effects of 
the pilots' continuing the approach while attempting to stabilize the flight 
path and conform to the approach requirements created a workload that 
increased the closer the airplane was to the runway. As the airplane neared 
the runway there was less time available to bring both speed parameters 
to acceptable levels, while speed, altitude, and descent rate tolerances nar- 
rowed. The resultant workload precluded the pilots' ability to monitor the 
airspeed effectively. By contrast, discontinuing the approach, though not a 
critical error, would have reflected poorly on both pilots, in particular the 
pilot flying who was still not qualified to fly the airplane unsupervised and 
may have needed additional observation time had he done so. Investigators 
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suggested that this knowledge may have encouraged the pilots to continue 
the approach despite the approach being unstabilized. 


Operator Antecedents: Experience 


The pilots were both experienced in air transport operations, with the moni- 
toring pilot an experienced Boeing 777 captain in addition to his exten- 
sive flight experience. However, both pilots were inexperienced in several 
elements critical to this accident. The captain, or pilot flying, was new to 
the Boeing 777 and was still under instructor pilot observation. All of the 
eight B-777 landings he had conducted were under supervision, and the 
approaches were all instrument landing system approaches, which provided 
precise vertical and lateral flight path guidance. The accident flight was his 
first attempt in the B-777 to execute a non-precision approach, an approach 
that is rare in air transport operations. The two types of B-777 inexperi- 
ence, in the airplane in general and in flying a visual approach in particular, 
served as an antecedent to his error of initiating, and then maintaining an 
approach with both the airspeed and the vertical speed, at different points in 
the approach, outside of acceptable parameters. 

Further, the pilot flying’s experience may well have interfered with his 
understanding of the B-777's automation system, and thus served as an ante- 
cedent to his errors. His previous airplane experience was onan Airbus-A-320. 
Although both airplanes are considered highly automated, equipped with 
full autopilot/autothrottle systems, the A-320, unlike the B-777, has a protec- 
tion designed to prevent an airplane from entering a stall, a protection that 
can only be disengaged by a specific pilot action. Further, his more recent 
training on the B-777 included a presentation on a stall protection system 
on the B-777, a protection system that, unlike the other airplane, could be 
disengaged without explicit pilot action. Investigators suggested that his 
recent experience on an airplane with a stall protection that remained active 
in the absence of specific pilot action may have interfered with his airspeed 
monitoring on the B-777, in the mistaken belief that the autothrottle would 
continue to offer stall protection throughout the flight. 

The instructor pilot, serving as the pilot monitoring and as first officer, 
was conducting his first unsupervised observation flight. A critical element 
of an instructor pilot’s responsibilities is being cognizant of the performance 
of the pilot under observation, and recognizing both when it is appropriate 
to call the flying pilot’s attention to flight parameters and when it is neces- 
sary to take control of the airplane when flight safety is considered at risk. 
Commenting at an inopportune or inappropriate time, or taking airplane 
control unnecessarily, considered an extreme act, can negatively affect pilot 
performance and be counterproductive. On the other hand, not comment- 
ing on pilot actions or decisions when necessary can contribute to poor pilot 
performance, and delaying taking airplane control when called for can com- 
promise flight safety. Allowing pilots to make mistakes is an effective way to 
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promote learning; but also recognizing when flight safety is at risk is a key to 
being an effective flight instructor. The instructor serving as the pilot moni- 
toring on this flight erred in both respects, he did not call the pilot flying's 
attention to the deteriorating airspeed until late in the approach, and did not 
take control of the flight when flight safety was endangered. 


Operator Antecedents: Fatigue 


Both pilots were also fatigued from the duration of the flight, and by the 
time of day at which the accident occurred. The pilots were scheduled to 
report for duty at 1510 local time and depart over an hour later. The acci- 
dent occurred at 1127 San Francisco time, or 0327 Korean time, a time that 
corresponds to what has been demonstrated to be a low point in people’s 
sleep. Despite the fact that they rested on the airplane, with the pilot flying 
and pilot monitoring sleeping about 2 and 3 hours, respectively, during the 
approximately 10 % hour flight, investigators concluded that their disrupted 
sleep and the effects of time of day on their circadian rhythms, led to their 
being fatigued. As investigators concluded, “all three pilots were likely expe- 
riencing some fatigue at the time of the accident, and each made errors that 
were consistent with the effects of fatigue” (National Transportation Safety 
Board, 2014, p. 86). 


Equipment Antecedents: Airplane System Complexity 
and Automation Opacity 


The pilot flying was unaware that when reducing the thrust to idle, the auto- 
throttle would change its operating mode from one that actively maintained 
the pilot-selected airspeed to one, called “hold,” in which it was disengaged 
from airspeed control. Investigators focused on what contributed to the 
pilots’ lack of familiarity with this automation feature, and why they nev- 
ertheless did not notice that the airspeed decreased to where it was below a 
safe speed, despite their extensive training and their demonstrated knowl- 
edge of the airplane’s autopilot and autothrottle systems. 

On the Boeing 777 and other modern air transport aircraft, the autothrottle, 
autopilot, and flight directors are interconnected; input into one can affect the 
others. The airline’s B-777 pilots were informed, in their training on the air- 
plane and in the airline’s flight manual (which they were required to review 
and whose key points they were required to master) of how the autothrottle 
mode changed, either through direct pilot input or through interaction with 
another feature of the automation, and how these mode changes were dis- 
played to the pilots. The automated modes were presented on a flight display 
panel in front of each pilot. To recognize that the modes changed the pilots 
had to look at a display and know the mode and understand its effects on 
flight control. However, the display was outside the expected visual scan of a 
pilot flying an approach; without an aural alert in the event pilots’ workload 
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FIGURE 16.3 

Primary flight display on Boeing 777. Airspeed at left column in bracket, vertical speed in 
right-most column, with automation mode presented in green at top. Autothrottle mode (top 
center box, left of HDG SEL) not displayed, only Heading Select (autopilot flight director roll 
mode) and Altitude Select (autopilot flight director pitch mode) displayed. (Courtesy of the 
National Transportation Safety Board.) 


was high, or if distracted, his or her ability to notice the change would be 
reduced (Figure 16.3). 

The pilots’ error of not fully disengaging both flight directors and then 
reengaging the first officer’s (or pilot monitoring on this flight) flight director, 
also was committed despite the airline’s flight crew training manual direct- 
ing pilots to do so when “intercepting the visual profile,” that is, established 
on the vertical path in the visual approach. Disengaging the flight directors 
caused the autothrottle to change to a default mode that would have main- 
tained airspeed control. However, the airline did not explain to their pilots 
the reason for disengaging both flight directors and reengaging one when 
conducting a visual approach. The pilots’ failure to recognize that the mode 
had changed occurred independently of a pilot action. The mode change was 
precipitated, to some extent, by their failure to disengage both flight directors 
and reengage one, as well as the flying pilots’ reducing the thrust to idle while 
on the approach. The combination of high workload and lack of transparency 
regarding the reason for flight director disengagement-reengagement, par- 
tially led to this error. The error then contributed to the disengagement of the 
autothrottle in the absence of specific pilot action to do so. 
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Equipment Antecedents: Equipment Information 


The Boeing 787 and the Boeing 777 autothrottles shared a similar feature; it 
did not “wake up” or reengage if, when in the thrust “hold” mode (the mode 
that the autothrottle defaulted to when the flying pilot reduced the thrust on 
approach), the airspeed decayed to less than that selected. However, inves- 
tigators learned that during the 787 certification flights, where the regula- 
tor, the Federal Aviation Administration determines the extent to which a 
new airplane meets requirements needed to be approved for flight, a Federal 
Aviation Administration test pilot, who had been unaware of this feature, 
recognized and commented to the agency on its adverse safety characteris- 
tics. In response, Boeing added to its 787 airplane flight manual the notice, 
“When in HOLD mode, the autothrottle will not wake up even during 
large deviations from target speed and does not support stall protection.” 
However, the notice, which was sent to all 787 users, was not sent to 777 
users, despite its having the same feature (the B-777 flight manual had been 
approved without it and thus, there was no “requirement” that users be so 
informed after the fact). Consequently, unless airlines changed their flight 
manuals on their own, or their pilots learned of the issue informally, 777 
pilots were unaware that the autothrottle would not reengage and maintain 
airspeed control if operating in the hold mode. Investigators learned that 
the pilot flying was unaware of this feature, believing instead that the auto- 
throttle would maintain pilot selected airspeed through all phases of flight 
when he engaged it. 


Company Antecedents: Automation Policy 


The airline contributed an antecedent by suggesting to its pilots, through a 
simulator demonstration, that the autothrottle would maintain the selected 
airspeed. The demonstration showed that with both autothrottle and autopi- 
lot disengaged, reducing thrust to idle and allowing the airspeed to reduce 
to below minimum speed (as shown on the airplane’s airspeed display), the 
autothrottle would “wake up” and increase thrust to return airspeed to min- 
imum maneuvering speed. It suggested to pilots that the airplane’s airspeed 
would remain active through all phases of flight, and that they could rely 
on it to maintain the necessary speeds. Of course, without knowledge of the 
“hold” mode feature of not responding to critically slow airspeed, the airline 
could not have known that its demonstration to its pilots was based on erro- 
neous information. 

The airline’s policy also encouraged its pilots to fully use the airplane’s 
automation during approach and landing. The airline’s B-777 chief pilot 
told investigators that the airline recommended using as much automation 
as possible when flying the airplane. By contrast, many airlines encourage 
their pilots to disengage the automation and control the airplane manu- 
ally during approach and landing, primarily to enable them to retain their 
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manual flying skills during flight phases when these skills are most needed, 
and to keep pilots fully in the loop of aircraft control during approach 
and landing flight phases. The airline’s encouragement of automation use, 
while not in and of itself an antecedent to the pilots’ error, failed to take 
into account that extensive reliance on automation can, over time, diminish 
operator skills when the automation is not acting as expected. As described 
previously, a system that consistently performs accurately and reliably will, 
over time, subtly influence operators’ oversight of that system and can lead 
an operator to fail to notice system anomalies in the event they occur. The 
company’s promotion of automation exacerbated the effects of the pilots’ 
mistaken pilot belief that the B-777 airspeed would not deteriorate to below 
a safe speed. 

Figure 16.4 demonstrates the relationship of the operator, manufacturer, 
and company antecedents to the pilots’ error of not monitoring the airspeed. 





Operator antecedents 


* Created high workload condition by not stabilizing 
airspeed and vertical speed during final approach 

* Individually inexperienced in critical performance 
skills 

* Did not fully disengage and reengage flight directors 

* Fatigue 














Manufacturer antecedents 


* Visual presentation only of operating mode with no 
aural indication of autothrottle mode change 

* Opaque autothrottle mode change independent of 
direct pilot action 

* Did not inform operators that one autothrottle 
operating mode was disengaged from airspeed control 

* Opaque relationship between flight director and 
autothrottle mode 


DS 


Company antecedents 





* Created the impression to pilots that autothrottle 
continuously maintains pilot-selected airspeed 

* Encouraged pilots to use only automation while flying 
on final approach 








FIGURE 16.4 
The antecedents that led to the pilots’ failing to monitor airspeed on approach to landing. 
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The individual antecedents to error in this accident can be attributed largely 
to several factors operating together: automation design, training, and airline 
policies that combined to adversely affect pilot performance once they had 
created a high workload situation during the approach. It is possible, if not 
likely, that this combination of antecedents would not have led to the pilots’ 
error had the antecedents been present during a routine operating phase. 
However, during a phase of unusually high workload, with a pilot flying 
who had never before flown this airplane on an approach that lacked precise 
vertical path guidance, and with an observation pilot relatively inexperi- 
enced in recognizing when he needed to take action to avoid a safety-related 
flight issue, the combination of circumstances allowed the antecedents to 
adversely affect the pilots’ performance. 


SST 


Antecedents and Errors 
Relationships between Antecedents and Errors 


In Chapter 3, the determination of antecedents was described as based on 
research supporting the analysis. In this accident considerable research on 
automation use, operator trust in automation, and operator awareness of 
mode changes supported the investigators’ conclusions. Further, investiga- 
tors attempted to answer two questions to enable them to determine whether 
the relationships proposed between antecedent, error, and the event under 
consideration met standards of acceptability. These were (1) would the error 
have occurred if the antecedents that preceded it had not been present and 
(2) would the accident have occurred if the error that preceded it had not 
been committed? As an additional check, three criteria were proposed to 
determine the value of relationships between antecedents and errors. These 
require the relationships to be simple, logical, and superior to other possible 
relationships. In this accident the answers to the counterfactual questions 
are clear, had the crew monitored the airspeed the accident would not have 
occurred, and had the antecedents cited not been present, the crew would 
have monitored the airspeed and hence would have recognized, in time to 
avoid the accident, that it was deteriorating and needed to be increased. The 
relationships between antecedents and error, moreover, meet investigative 
standards of logic, simplicity, and superiority to other explanations for the 
error. 


Terminating the Search for Antecedents 


Chapter 3 also addressed the stopping point at which the search for ante- 
cedents should be stopped. One can go back seemingly indefinitely to 
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antecedents that may have influenced the errors, but at some point one 
would reach a point of diminishing returns and the additional search would 
not be worth the effort. In this accident, that search terminated at the manu- 
facturer and the airline. Searching for antecedents beyond these would have 
diluted the importance of the antecedents that are closest to the critical errors 
leading to the accident. 





Recommendations 


Investigators issued 21 recommendations to address deficiencies they identi- 
fied in this accident. Some addressed shortcomings regarding airport fire 
and rescue capabilities, but 12 of the recommendations addressed crew per- 
formance with a highly automated airplane and the antecendents to error 
discussed presently. The recommendations, which were directed to the 
Federal Aviation Administration, the airline, and the airplane’s manufac- 
turer, can be seen to address each of the antecedents cited, except for the 
pilots’ workload. That was already addressed by the airline’s requirement for 
their pilots to discontinue approaches that are not stabilized. Among other 
recommendations, investigators asked the Federal Aviation Administration 
to (National Transportation Safety Board, 2014): 


Require Boeing to develop enhanced 777 training that will improve 
flight crew understanding of autothrottle modes and automatic activa- 
tion system logic through improved documentation, courseware, and 
instructor training. 

Once the enhanced Boeing 777 training has been developed, as 
requested in...[the previous recommendation], require operators and 
training providers to provide this training to 777 pilots. 

Require Boeing to revise its 777 Flight Crew Training Manual stall pro- 
tection demonstration to include an explanation and demonstration of 
the circumstances in which the autothrottle does not provide low speed 
protection. 

Once the revision to the Boeing 777 Flight Crew Training Manual 
has been completed, as requested in...[the previous recommendation], 
require operators and training providers to incorporate the revised stall 
protection demonstration in their training. 

Convene an expert panel (including members with expertise in human 
factors, training, and flight operations) to evaluate methods for training 
flight crews to understand the functionality of automated systems for 
flightpath management, identify the most effective training methods, 
and revise training guidance for operators in this area. 

Convene a special certification design review of how the Boeing 777 
automatic flight control system controls airspeed and use the results 
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of that evaluation to develop guidance that will help manufacturers 
improve the intuitiveness of existing and future interfaces between 
flight crews and autoflight systems. 

Task a panel of human factors, aviation operations, and aircraft 
design specialists, such as the Avionics Systems Harmonization 
Working Group, to develop design requirements for context-depen- 
dent low energy alerting systems for airplanes engaged in commercial 
operations. (p. 130) 


Investigators also asked the airline to 


Revise your flight instructor operating experience (OE) qualification 
criteria to ensure that all instructor candidates are supervised and 
observed by a more experienced instructor during OE or line train- 
ing until the new instructor demonstrates proficiency in the instructor 
role. 

Issue guidance in the Boeing 777 Pilot Operating Manual that after dis- 
connecting the autopilot on a visual approach, if flight director guidance 
is not being followed, both flight director switches should be turned off. 

Modify your automation policy to provide for more manual flight, 
both in training and in line operations, to improve pilot proficiency. (pp. 
131-132) 


Investigators asked the manufacturer to 


Revise the Boeing 777 Flight Crew Operating Manual to include a spe- 
cific statement that when the autopilot is off and both flight director 
switches are turned off, the autothrottle mode goes to speed (SPD) mode 
and maintains the mode control panel-selected speed. 

Using the guidance developed by the low energy alerting system panel 
created in accordance with... [a recommendation issued as a result of 
this accident] develop and evaluate a modification to Boeing wide-body 
automatic flight control systems to help ensure that the aircraft energy 
state remains at or above the minimum desired energy condition during 
any portion of the flight. (p. 132) 





Summary 


This accident illustrates how antecedents that individually would likely not 
have led to error, interacted during the final approach and attempted land- 
ing, to lead the operators to commit a fundamental piloting error. All pilots, 
from the beginning of their flying careers, are taught to monitor their air- 
speed, yet these pilots failed to do so through the most critical flight opera- 
tional phases, approach and landing. 
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They created the circumstances that allowed the antecedents to interact 
and affect their performance by failing to simultaneously stabilize both the 
airplane’s airspeed and vertical speed during the approach so that one or the 
other was outside of acceptable operating range throughout the approach. 
Their efforts to stabilize both flight parameters in the decreasing available 
time until landing increased their workload to the point that they were 
unable to devote the effort needed to monitor critical aspects of the approach. 

As a result of the design of the airplane's display, in which only visual 
information about the autothrottle operating mode change was presented, 
the pilots failed to notice that the autothrottle had changed modes indepen- 
dent of any direct action on their part. An aural alert signaling the change 
would likely have caught their attention in a way that the visual presenta- 
tion, in the high workload environment, did not. 

The crew’s lack of awareness of the mode change was exacerbated by the 
manufacturer's failure to inform its operators that this airplane allowed that 
particular autothrottle operating mode, the mode that the autothrottle itself 
engaged when the pilot flying briefly moved the thrust levers to flight idle, 
to disengage itself from speed control. 

The pilots’ lack of awareness of (1) the change in autothrottle mode 
and (2) the effects of that change on airspeed control, was exacerbated by 
their respective inexperience, the pilot flying in executing nonprecision 
approaches in this airplane and the pilot monitoring in observing the per- 
formance of B-777 pilots. Together, their inexperience contributed to their 
errors. Further, their fatigue contributed to degradation in their monitoring 
of critical flight parameters, a monitoring that was already compromised by 
the high workload the pilots had created for that phase of flight. 

Finally, the pilots’ expectations regarding speed control and their manual 
flying performance were adversely affected by the airline’s training and 
policies. They were led to believe that the autothrottle would not allow air- 
plane speed to deteriorate to an unsafe level, and they were encouraged to 
rely exclusively on the airplane’s automation for flight and airspeed control 
throughout the different phases of flight. They failed to notice, in time to 
avoid the accident, that they needed to control airspeed manually. 
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Final Thoughts 











Most investigations are carried out after an event has occurred. But the 
search to reduce opportunities for error should be ongoing, even in the 
absence of an accident. System managers, administrators, operators, regula- 
tors, and others involved in operating complex systems need to be vigilant 
in the search for system deficiencies that could lead to errors and accidents. 
The potential for unknown and unrecognized antecedents to error resid- 
ing in systems is too great to allow regulators, companies, and operators to 
become complacent about the safety of their systems. Avoiding such views 
and remaining on guard to identify and mitigate error antecedents is proac- 
tive and in the best interests of safety. 

Given the daily pressures of those involved in system operations, one could 
be expected to encounter difficulties conducting proactive investigations in 
the absence of an incident or accident (e.g., Carroll, Rudolph, Hatakenaka, 
Wiederhold, and Boldrini, 2001). Few operators, managers, or regulators 
have the time available for the data gathering and analysis activities that are 
needed to recognize and suggest remediation strategies to reduce system 
deficiencies and vulnerabilities. 

Reason (1997) offers several techniques to improve the “safety culture” 
of complex systems. He argues that safety cultures are designed to reduce 
opportunities for error in complex systems, and actions can be taken before 
the fact to improve the safety of many aspects of system operations, from 
maintenance, to regulation, to daily operations, if the necessary data have 
been collected and disseminated. As he writes, 


In the absence of bad outcomes, the best way—perhaps the only way— 
to sustain a state of intelligent and respectful wariness is to gather the 
right kinds of data. This means creating a safety information system 
that collects, analyses and disseminates information from incidents and 
near misses as well as from regular proactive checks on the system’s 
vital signs. All of these activities can be said to make up an informed 
culture—one in which those who manage and operate the system have 
current knowledge about the human, technical, organizational and envi- 
ronmental factors that determine the safety of the system as a whole. In 
most important respects, an informed culture is a safety culture. (p. 195) 


Strauch (2015), by contrast, distinguishes between organizational errors 


and individual operator errors, and provides guidance to investigators to 
identify and investigate the role of antecedents to organizational errors. 
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Companies that were aware of operational deficiencies and did not take 
action to address them, or companies that, because of the nature of the orga- 
nizational shortcomings should have addressed them but did not, will have 
committed organizational errors and thus can be considered to have caused 
or contributed to the cause of the errors that their operators have committed. 

While it may be difficult for companies to acknowledge that their actions or 
inactions led to errors that caused accidents, such acknowledgement is nec- 
essary for safe operation. Only by clear, objective, and systematic data gath- 
ering efforts, irrespective of where the data lead, can companies learn from 
their own errors and implement meaningful measures to enhance safety. 
Managers, administrators, regulators, and others hoping to obtain “the right 
kinds of data” can accomplish this in several ways. Reason suggests devel- 
oping and implementing a self-reporting system in which employees can 
report safety deficiencies in a non-punitive environment. A self-reporting 
system should encourage learning about safety deficiencies and security vul- 
nerabilities before they lead to potentially severe consequences. Companies 
can also conduct proactive investigations in response to minor events. These 
investigations may highlight previously unknown safety-related informa- 
tion, and improve investigative skills as well. 





Investigative Proficiency 


Conducting safety reviews and proactive investigations also help maintain and 
enhance investigative proficiency, as well as providing information to enhance 
safety. The environment in which proactive incident investigations are con- 
ducted, in the absence of a major event, is also likely to be free of the stresses 
that often follow major events, and therefore, likely to be a supportive environ- 
ment for novice investigators. Investigative skill is like any other, the more one 
can practice it, the better one will be when needed to exercise those skills. 





Criteria 


Because of routine operational needs, managers and administrators may be 
reluctant to divert potentially valuable resources from operational duties to 
conduct incident investigations, despite the likely long-term safety benefits 
from conducting investigations. Rather, they may instead focus on the per- 
sonnel and resource expenditures needed to conduct proactive investigations. 

Investigators can help managers and administrators select events that war- 
rant investigations by developing criteria with which to evaluate the need for 
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proactive investigations. The criteria that follow are applicable to nearly all 
complex systems: 


* Type and frequency of previous operator errors committed and 
severity of their consequences 


e Frequency and severity of recent incidents 

* Interval since most recent investigation 

* Amount and value of available system safety data 
* Value of potential lessons learned 


The greater the number of operator errors in the incident, the more serious 
their consequences, the more frequent the recent incidents, and the lower 
the amount and value of available system safety data available, the more an 
investigation is warranted. 

An incident in which many errors were committed has a greater need for 
investigation than one with just a few. A system that has experienced a rel- 
atively high number of recent incidents also would benefit from proactive 
investigation. These suggest the presence of safety deficiencies that could 
otherwise lead to a major accident, and that could lead to effective recom- 
mendations to address the deficiencies. On the other hand, a system that 
already collects a substantial amount of available safety data may not benefit 
as much from a proactive investigation as would one with less data. In those 
instances, the cost of collecting additional safety-related data may not be out- 
weighed by the potential benefits, assuming the data provide information 
about the presence of known system safety deficiencies. 





Models of Error, Investigations, and Research 


Moray (1994, 2000) and Reason's (1990, 1997) models of error have guided 
much of the process outlined in this text. These models have had consid- 
erable value in helping understand error, and substantial influence on the 
insights of students of error. Both have helped bridge the gap that often 
exists between theories and their application to error investigations. 

Models help researchers and investigators understand the data they have 
gathered, the contribution of the data to the overall investigation, and they 
serve to guide investigators' analytical efforts. However, as models guide, 
models can also hinder, if investigators rigidly adhere to them to the det- 
riment of other, more applicable investigative approaches. No single error 
model is equally applicable to all circumstances; each may have shortcom- 
ings that are unique to particular circumstances. 
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As with empirical research, one needs to follow what the data describe 
rather than the models or theories used to explain them. This means that 
investigations should allow the data to determine the relationships between 
antecedents and errors and relationships between errors and the events 
under investigation. Assuming that investigators have obtained the needed 
data, apply the model or theory that best explains the relationships of inter- 
est to the investigation. Although this text has adopted Moray and Reason’s 
models to illustrate investigative technique and methodology, others may 
be better suited to the needs of an investigation. So long as the fundamental 
rules of investigative logic are followed, the derived relationships and expla- 
nations will be sound. 


Research and Investigations 


Both research studies and accident investigations can provide data that 
explain behavior in complex systems. For example, as discussed, the exten- 
sive research that has been conducted on decision making in “real world” 
dynamic environments has led to findings that are directly relevant to the 
investigations of many events, helping investigators understand the nature 
of the operator decisions and thus assisting in the development of remedia- 
tion strategies to prevent similar occurrences. Information from accident 
investigations has also helped to focus research needs and activities by 
revealing operator actions in real world settings. Knowing the results of both 
research and investigations of similar accidents and incidents can assist both 
the researcher and the investigator to better understand the issues being 
examined in the particular investigation. 

However, investigators may encounter events where few relevant research 
studies and investigations have been conducted. Circumstances in which 
relatively unexplored issues play major roles in accidents occasionally occur 
and, while the efforts to investigate them may be considerable, the derived 
information may have substantial value to a variety of settings. For exam- 
ple, investigators of the 1996 explosion of the Boeing 747 over Long Island 
(National Transportation Safety Board, 2000), focused on a rarely encoun- 
tered scenario, an in-flight explosion caused by fuel tank vapors that had 
ignited, on which little pertinent information was available. Much of the 
information obtained in that investigation was applied to aircraft design and 
aircraft certification, going a long way to enhancing safety by making this 
event highly unlikely in the future. 

Investigators conducted original research to obtain fundamental informa- 
tion regarding fuel volatility and ignition sources to understand the phe- 
nomenon. Although it is rare for researchers to conduct original research in 
the context of an accident investigation, the needs of the investigation may 
require that. In the investigation of that Boeing 747 accident, the information 
obtained from the investigation will help aircraft designers, regulators, and 
the aviation industry improve aviation safety for years to come. 
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Quick Solutions 


Operators, managers, regulators—as well as investigators—may seek quick 
or facile solutions to address a recognized safety deficiency or vulnerabil- 
ity. Frequently, quick solutions are needed and appropriate. However, the 
complexity of modern systems and the relationships among antecedents 
and errors within them often call for complex and time consuming solu- 
tions to provide effective mitigation techniques. Those involved in system 
operations need to be prepared to implement long-term, potentially difficult 
strategies to improve system safety. The methods may be expensive and/ 
or difficult to implement, but the objective of improving system safety will 
warrant it. 





Conclusions 


When beginning an investigation the task may seem arduous, and the frus- 
trations overwhelming, but the benefits to be gained from systematic and 
thorough investigations will make the effort worthwhile. One has only to 
examine the steady improvements in system safety to appreciate the benefits 
to be gained. For example, aircraft accidents today are so rare that only a 
handful of major accidents occur each year, worldwide. Twenty years ago, 
for example, it seemed that the same number of aircraft accidents that occur 
ina year today would occur in a month, despite the considerable increase in 
worldwide flight operations since then. 

The potential for human error will not be eliminated. However, inves- 
tigators have demonstrated that opportunities for error can be effectively 
reduced. Complex systems are likely to increase in their complexity and, 
as Perrow (1999) has argued, this will increase the likelihood of “normal 
accidents.” But even Perrow would argue that applying the lessons of error 
investigations reduces their likelihood. The objective of this text has been to 
provide the knowledge and skills investigators need to accomplish this. The 
benefits of doing so will continue to make the efforts worthwhile. 
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